mirror of
https://github.com/bvanroll/college-datascience.git
synced 2025-08-28 11:32:40 +00:00
fml
This commit is contained in:
@@ -97,15 +97,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"392\n",
|
||||
"392\n"
|
||||
"<class 'sklearn.neighbors.classification.KNeighborsClassifier'>\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -117,18 +116,6 @@
|
||||
"/home/beppe/.local/lib/python3.7/site-packages/sklearn/utils/validation.py:563: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64).\n",
|
||||
" FutureWarning)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
|
||||
" metric_params=None, n_jobs=None, n_neighbors=1, p=2,\n",
|
||||
" weights='uniform')"
|
||||
]
|
||||
},
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
@@ -160,7 +147,9 @@
|
||||
"xtrain, xtest, ytrain, ytest = train_test_split(setjen, target, random_state=0)\n",
|
||||
"\n",
|
||||
"knn = KNeighborsClassifier(n_neighbors=1)\n",
|
||||
"knn.fit(xtrain, ytrain)\n"
|
||||
"f = knn.fit(xtrain, ytrain)\n",
|
||||
"print(type(f))\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
259
7/.ipynb_checkpoints/Labo7-checkpoint.ipynb
Normal file
259
7/.ipynb_checkpoints/Labo7-checkpoint.ipynb
Normal file
@@ -0,0 +1,259 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Labo 7 Data Science: Decision Trees"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<div class=\"alert alert-block alert-warning\">\n",
|
||||
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 & 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
|
||||
"</div>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
|
||||
"\n",
|
||||
"**1.1 De data verkennen**\n",
|
||||
"\n",
|
||||
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
|
||||
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
|
||||
" * beds: het aantal bedden\n",
|
||||
" * bath: het aantal baden\n",
|
||||
" * price: de verkoopprijs (\\$)\n",
|
||||
" * year_built: het bouwjaar\n",
|
||||
" * sqft: de oppervlakte in square foot\n",
|
||||
" * price_per_sqft: de prijs (\\$) per square foot\n",
|
||||
" * elevation: hoogte in m\n",
|
||||
"\n",
|
||||
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
|
||||
"\n",
|
||||
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
|
||||
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
|
||||
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "ParserError",
|
||||
"evalue": "Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mParserError\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[0;32m<ipython-input-7-d1e7aebbc63b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mhousing\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"housing-ny-sf.csv\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskip_blank_lines\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskipinitialspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 700\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 701\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 702\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 703\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 435\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 436\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1137\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1138\u001b[0m \u001b[0mnrows\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'nrows'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1139\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;31m# May alter columns / col_dict\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1993\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1994\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1995\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1996\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1997\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"from sklearn.tree import DecisionTreeClassifier\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"from sklearn import tree\n",
|
||||
"#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\n",
|
||||
"housing = pd.read_csv(\"housing-ny-sf.csv\",skip_blank_lines=True,skipinitialspace=True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"train_test_split(housing.iloc[:,1:], housing['in_sf'], test_size=0.3, random_states=0)\n",
|
||||
"\n",
|
||||
"#housing.groupby('in_sf').\n",
|
||||
"housing.groupby('in_sf').price.hist(alpha=0.4)\n",
|
||||
"\n",
|
||||
"for i in range(1,10):\n",
|
||||
" clf - tree.DecisionTreeClassifier(random_states=0, max_depth=i)\n",
|
||||
" clf = clf.fit(x_train, y_train)\n",
|
||||
" print(clf.score(x_train, y_train))\n",
|
||||
" print(clf.score(x_test, y_test))\n",
|
||||
" print(\"\\n/n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = tree.DecisionTreeClassifier(random_state=0,max_depth=3)\n",
|
||||
"clf = clf.fit(x_train, y_train)\n",
|
||||
"\n",
|
||||
"import graphviz\n",
|
||||
"dot_data = tree.export_graphviz(clf, out_file=Nonem feature_names=housing.iloc[:,1:].columns, class_names=['New York', 'San Fransisco'] filled=True, rounded=True, special_characters=True)\n",
|
||||
"graph = graphviz.Source(dot_data)\n",
|
||||
"graph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**1.2 Training en parameter tuning**\n",
|
||||
"\n",
|
||||
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state ≠ 0 bv. 88\n",
|
||||
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
|
||||
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"**2.1 Generatie van de sample data**\n",
|
||||
"\n",
|
||||
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
|
||||
"\n",
|
||||
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
|
||||
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
|
||||
" \n",
|
||||
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
|
||||
"\n",
|
||||
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
|
||||
"\n",
|
||||
"* Deel de data in in een training- en testset (70%/30%)\n",
|
||||
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
|
||||
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
|
||||
"\n",
|
||||
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
|
||||
"\n",
|
||||
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
|
||||
"\n",
|
||||
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.3 Random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
|
||||
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
|
||||
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
259
7/Labo7.ipynb
Normal file
259
7/Labo7.ipynb
Normal file
@@ -0,0 +1,259 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Labo 7 Data Science: Decision Trees"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<div class=\"alert alert-block alert-warning\">\n",
|
||||
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 & 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
|
||||
"</div>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
|
||||
"\n",
|
||||
"**1.1 De data verkennen**\n",
|
||||
"\n",
|
||||
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
|
||||
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
|
||||
" * beds: het aantal bedden\n",
|
||||
" * bath: het aantal baden\n",
|
||||
" * price: de verkoopprijs (\\$)\n",
|
||||
" * year_built: het bouwjaar\n",
|
||||
" * sqft: de oppervlakte in square foot\n",
|
||||
" * price_per_sqft: de prijs (\\$) per square foot\n",
|
||||
" * elevation: hoogte in m\n",
|
||||
"\n",
|
||||
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
|
||||
"\n",
|
||||
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
|
||||
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
|
||||
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "ParserError",
|
||||
"evalue": "Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mParserError\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[0;32m<ipython-input-7-d1e7aebbc63b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mhousing\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"housing-ny-sf.csv\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskip_blank_lines\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskipinitialspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 700\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 701\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 702\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 703\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 435\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 436\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1137\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1138\u001b[0m \u001b[0mnrows\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'nrows'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1139\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;31m# May alter columns / col_dict\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1993\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1994\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1995\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1996\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1997\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n",
|
||||
"\u001b[0;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"from sklearn.tree import DecisionTreeClassifier\n",
|
||||
"from sklearn.model_selection import train_test_split\n",
|
||||
"from sklearn import tree\n",
|
||||
"#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\n",
|
||||
"housing = pd.read_csv(\"housing-ny-sf.csv\",skip_blank_lines=True,skipinitialspace=True)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"train_test_split(housing.iloc[:,1:], housing['in_sf'], test_size=0.3, random_states=0)\n",
|
||||
"\n",
|
||||
"#housing.groupby('in_sf').\n",
|
||||
"housing.groupby('in_sf').price.hist(alpha=0.4)\n",
|
||||
"\n",
|
||||
"for i in range(1,10):\n",
|
||||
" clf - tree.DecisionTreeClassifier(random_states=0, max_depth=i)\n",
|
||||
" clf = clf.fit(x_train, y_train)\n",
|
||||
" print(clf.score(x_train, y_train))\n",
|
||||
" print(clf.score(x_test, y_test))\n",
|
||||
" print(\"\\n/n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clf = tree.DecisionTreeClassifier(random_state=0,max_depth=3)\n",
|
||||
"clf = clf.fit(x_train, y_train)\n",
|
||||
"\n",
|
||||
"import graphviz\n",
|
||||
"dot_data = tree.export_graphviz(clf, out_file=Nonem feature_names=housing.iloc[:,1:].columns, class_names=['New York', 'San Fransisco'] filled=True, rounded=True, special_characters=True)\n",
|
||||
"graph = graphviz.Source(dot_data)\n",
|
||||
"graph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**1.2 Training en parameter tuning**\n",
|
||||
"\n",
|
||||
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state ≠ 0 bv. 88\n",
|
||||
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
|
||||
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"**2.1 Generatie van de sample data**\n",
|
||||
"\n",
|
||||
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
|
||||
"\n",
|
||||
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
|
||||
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
|
||||
" \n",
|
||||
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
|
||||
"\n",
|
||||
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
|
||||
"\n",
|
||||
"* Deel de data in in een training- en testset (70%/30%)\n",
|
||||
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
|
||||
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
|
||||
"\n",
|
||||
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
|
||||
"\n",
|
||||
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
|
||||
"\n",
|
||||
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.3 Random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
|
||||
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
|
||||
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
495
7/housing-ny-sf.csv
Normal file
495
7/housing-ny-sf.csv
Normal file
@@ -0,0 +1,495 @@
|
||||
# This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
|
||||
#
|
||||
in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
|
||||
0,2,1,999000,1960,1000,999,10
|
||||
0,2,2,2750000,2006,1418,1939,0
|
||||
0,2,2,1350000,1900,2150,628,9
|
||||
0,1,1,629000,1903,500,1258,9
|
||||
0,0,1,439000,1930,500,878,10
|
||||
0,0,1,439000,1930,500,878,10
|
||||
0,1,1,475000,1920,500,950,10
|
||||
0,1,1,975000,1930,900,1083,10
|
||||
0,1,1,975000,1930,900,1083,12
|
||||
0,2,1,1895000,1921,1000,1895,12
|
||||
0,3,3,2095000,1926,2200,952,4
|
||||
0,1,1,999000,1982,784,1274,5
|
||||
0,1,1,999000,1982,784,1274,5
|
||||
0,1,1,1249000,1987,826,1512,3
|
||||
0,0,1,1110000,2008,698,1590,5
|
||||
0,2,2,2059500,2008,1373,1500,5
|
||||
0,2,2,2000000,1928,1200,1667,10
|
||||
0,1,1,715000,1903,557,1284,3
|
||||
0,2,2,2498000,2005,1260,1983,2
|
||||
0,2,1.5,2650000,1915,2500,1060,5
|
||||
0,2,2,3450000,1900,1850,1865,9
|
||||
0,1,1,3105000,2016,1108,2802,10
|
||||
0,4,5,13750000,2016,3699,3717,10
|
||||
0,2,2,1185000,1900,1000,1185,4
|
||||
0,1,2,1699000,1900,1500,1133,5
|
||||
0,1,1,1195000,1900,1093,1093,6
|
||||
0,1,1,450000,1920,500,900,10
|
||||
0,1,1,1195000,1900,1093,1093,10
|
||||
0,2,2,1185000,1900,1000,1185,10
|
||||
0,0,1,625000,1964,800,781,8
|
||||
0,2,2,3159990,2012,1420,2225,8
|
||||
0,2,2,2200000,1920,2050,1073,10
|
||||
0,2,2,4895000,2016,1520,3220,10
|
||||
0,2,2,6525000,2016,2018,3233,10
|
||||
0,2,2,4895000,2016,1520,3220,11
|
||||
0,2,2,6525000,2016,2018,3233,11
|
||||
0,2,2,1675000,2006,1225,1367,12
|
||||
0,2,2,1675000,2006,1225,1367,12
|
||||
0,2,1,999000,1985,799,1250,5
|
||||
0,1,1,1550000,1926,1000,1550,6
|
||||
0,1,1,1595000,2015,714,2234,7
|
||||
0,2,2,3995000,1906,2400,1665,9
|
||||
0,1,1,1285000,2012,749,1716,10
|
||||
0,1,1,1595000,2015,714,2234,10
|
||||
0,2,2,3995000,1906,2400,1665,10
|
||||
0,2,2,4995000,1900,2024,2468,8
|
||||
0,3,2,3580000,1880,3000,1193,10
|
||||
0,3,3,6350000,2015,2500,2540,3
|
||||
0,3,3,6550000,2015,2500,2620,3
|
||||
0,3,3,5985000,1909,3300,1814,3
|
||||
0,3,3,9900000,2015,2950,3356,4
|
||||
0,1,1,1849000,1920,1400,1321,5
|
||||
0,2,2,3500000,1915,2000,1750,5
|
||||
0,2,2,3500000,1910,1887,1855,5
|
||||
0,1,1,1250000,2007,720,1736,6
|
||||
0,2,1,899000,1990,820,1096,7
|
||||
0,1,2,1950000,1900,1300,1500,7
|
||||
0,1,1,1750000,1963,1000,1750,5
|
||||
0,0,1,775000,2009,546,1419,6
|
||||
0,0,1,390000,1955,550,709,6
|
||||
0,0,1,699999,1988,500,1400,9
|
||||
0,1,1,649000,1965,750,865,10
|
||||
0,2,2,1200000,1964,1200,1000,10
|
||||
0,0,1,319000,1941,500,638,10
|
||||
0,0,1,699999,1988,500,1400,10
|
||||
0,0,1,775000,2009,546,1419,10
|
||||
0,1,1,649000,1965,750,865,10
|
||||
0,2,2,1849000,2009,1135,1629,10
|
||||
0,1,1,615000,1960,750,820,11
|
||||
0,1,1,1590000,2008,785,2025,12
|
||||
0,1,1,1150000,2014,645,1783,18
|
||||
0,2,2,2650000,2014,1240,2137,18
|
||||
0,0,1,469000,1932,500,938,18
|
||||
0,0,1,569000,1962,543,1048,8
|
||||
0,1,1,549000,1924,750,732,10
|
||||
0,2,2,799000,1928,850,940,19
|
||||
0,2,2,879000,1928,800,1099,22
|
||||
0,2,3,2999000,1925,3200,937,10
|
||||
0,2,3,2999000,1925,3200,937,10
|
||||
0,2,3,2999000,1925,3200,937,12
|
||||
0,2,3,2999000,1925,3200,937,12
|
||||
0,1,1,399000,1910,475,840,10
|
||||
0,1,1,1135000,2005,715,1587,10
|
||||
0,1,1,1145000,2005,700,1636,10
|
||||
0,1,1,1760000,1969,950,1853,10
|
||||
0,2,2,1725000,2005,976,1767,10
|
||||
0,2,2,2750000,2007,1384,1987,10
|
||||
0,2,2,2750000,2007,1384,1987,13
|
||||
0,1,1,399000,1910,475,840,14
|
||||
0,2,2,1725000,2005,1144,1508,16
|
||||
0,1,1,869000,1988,670,1297,16
|
||||
0,1,1,1760000,1969,950,1853,17
|
||||
0,3,3,1249000,1962,1500,833,18
|
||||
0,3,3,1500000,1962,1600,938,18
|
||||
0,1,1,1135000,2005,715,1587,19
|
||||
0,1,1,1145000,2005,700,1636,19
|
||||
0,2,2,1725000,2005,976,1767,19
|
||||
0,2,2,3500000,1925,1550,2258,20
|
||||
0,0,1,525000,1940,525,1000,21
|
||||
0,2,2,2025000,1940,1433,1413,21
|
||||
0,2,2,3500000,1986,1463,2392,23
|
||||
0,0,1,925000,1978,585,1581,24
|
||||
0,2,2,1700000,1982,1007,1688,25
|
||||
0,2,2,1700000,1982,1007,1688,25
|
||||
0,0,1,449000,1962,550,816,10
|
||||
0,0,1,299000,1930,400,748,12
|
||||
0,1,1,695000,1961,720,965,16
|
||||
0,1,1,695000,1961,720,965,16
|
||||
0,4,5,17750000,2012,4476,3966,20
|
||||
0,5,5,27500000,1930,7500,3667,21
|
||||
0,0,1,539000,1957,485,1111,10
|
||||
0,0,1,779000,1975,512,1521,10
|
||||
0,1,1,925000,1985,745,1242,10
|
||||
0,1,1,1319000,1958,1000,1319,10
|
||||
0,2,2,4240000,2016,1741,2435,10
|
||||
0,2,2,4285000,2016,1747,2453,10
|
||||
0,3,3,4875000,2016,2017,2417,10
|
||||
0,3,3,14950000,1931,4435,3371,10
|
||||
0,3,3,10225000,2016,3007,3400,10
|
||||
0,4,4,13400000,2016,3331,4023,10
|
||||
0,5,5,19000000,2016,4972,3821,10
|
||||
0,0,1,539000,1957,485,1111,11
|
||||
0,1,1,835000,1963,700,1193,12
|
||||
0,10,10,7995000,1910,6400,1249,12
|
||||
0,0,1,349000,1960,400,873,13
|
||||
0,1,1,588000,1970,600,980,14
|
||||
0,2,2,8690000,2002,2178,3990,15
|
||||
0,2,2,4195000,2016,1750,2397,15
|
||||
0,2,2,4195000,2016,1742,2408,15
|
||||
0,2,2,4240000,2016,1741,2435,15
|
||||
0,2,2,4285000,2016,1747,2453,15
|
||||
0,3,3,4875000,2016,2017,2417,15
|
||||
0,3,3,5750000,2016,2196,2618,15
|
||||
0,3,3,9350000,2016,3054,3062,15
|
||||
0,3,3,10225000,2016,3007,3400,15
|
||||
0,4,4,13150000,2016,3338,3939,15
|
||||
0,4,4,13400000,2016,3331,4023,15
|
||||
0,5,5,19000000,2016,4972,3821,15
|
||||
0,0,1,850000,1924,546,1557,10
|
||||
0,7,7,19500000,1994,4238,4601,10
|
||||
0,1,1,1000000,1912,612,1634,18
|
||||
0,0,1,485000,1902,310,1565,23
|
||||
0,3,2,7350000,1999,2075,3542,23
|
||||
0,1,1,1200000,1969,600,2000,23
|
||||
0,2,2,4250000,2011,1504,2826,23
|
||||
0,2,2,3600000,1960,1340,2687,24
|
||||
0,2,2,3600000,1960,1340,2687,24
|
||||
0,1,1,525000,1924,700,750,24
|
||||
0,0,1,545000,1900,387,1408,10
|
||||
0,1,1,535000,1900,450,1189,10
|
||||
0,4,4,5500000,1923,3200,1719,21
|
||||
0,4,5,12000000,1939,3700,3243,22
|
||||
0,1,2,1850000,2007,839,2205,23
|
||||
0,1,1,535000,1900,450,1189,27
|
||||
0,2,2,1350000,1931,1300,1038,36
|
||||
0,1,1,779000,1902,700,1113,10
|
||||
0,2,1,1475000,1973,971,1519,10
|
||||
0,2,2,1385000,1971,962,1440,10
|
||||
0,1,1,779000,1902,700,1113,13
|
||||
0,1,1,649000,1929,800,811,25
|
||||
0,0,1,725000,1960,600,1208,26
|
||||
0,2,1,925000,1941,790,1171,27
|
||||
0,1,1,965000,1961,787,1226,27
|
||||
0,2,1,1250000,1922,1145,1092,30
|
||||
0,1,1,650000,1907,720,903,32
|
||||
0,2,1,385000,1999,820,470,8
|
||||
0,2,2,1695000,2012,1051,1613,16
|
||||
0,1,1,600000,1899,1060,566,8
|
||||
0,2,1,910000,1899,1060,858,8
|
||||
0,8,7,2300000,1910,4180,550,9
|
||||
0,1,1,600000,1899,1060,566,10
|
||||
0,2,1,910000,1899,1060,858,10
|
||||
0,2,2,1599000,1973,1400,1142,10
|
||||
0,2,2,4625000,1987,1695,2729,10
|
||||
0,0,1,325000,1910,375,867,11
|
||||
0,2,2,1599000,1973,1400,1142,14
|
||||
0,2,2,1965000,2005,1330,1477,16
|
||||
0,1,1,735000,1928,800,919,22
|
||||
0,2,2,4625000,1987,1695,2729,29
|
||||
0,1,1,749000,2011,762,983,10
|
||||
0,1,1,499999,2011,669,747,35
|
||||
0,1,1,749000,2011,762,983,35
|
||||
0,2,2,904000,1920,1503,601,10
|
||||
0,2,2,904000,1920,1503,601,10
|
||||
0,2,1,559900,1925,1200,467,51
|
||||
0,2,1,545000,1939,1049,520,39
|
||||
0,2,1,365000,1925,700,521,73
|
||||
0,2,1,365000,1925,700,521,73
|
||||
0,1,1,935000,1910,1102,848,8
|
||||
0,0,1,820000,2013,533,1538,8
|
||||
0,0,1,835000,2013,501,1667,8
|
||||
0,1,1,935000,1910,1102,848,10
|
||||
0,1,1,1420000,2004,768,1849,10
|
||||
0,1,1,1550000,2004,794,1952,10
|
||||
0,2,2,1635000,2004,957,1708,10
|
||||
0,1,1,1550000,2004,794,1952,11
|
||||
0,1,1,1780000,2007,988,1802,14
|
||||
0,2,2,2800000,2007,1308,2141,14
|
||||
0,1,1,411500,1921,586,702,6
|
||||
0,2,2,2175000,1999,1569,1386,10
|
||||
0,2,2,1995000,1996,1044,1911,10
|
||||
0,2,2,2235000,1999,1548,1444,14
|
||||
0,4,4,8800000,1941,3382,2602,15
|
||||
0,2,2,1850000,1996,1044,1772,17
|
||||
0,2,2,1995000,1996,1044,1911,17
|
||||
0,1,1,1695000,1927,680,2493,17
|
||||
0,1,1,1495000,1962,1125,1329,19
|
||||
0,2,2,2200000,2000,1044,2107,2
|
||||
0,0.5,1,384900,1962,540,713,10
|
||||
0,1,1,515000,1962,725,710,10
|
||||
0,3,2,1950000,1956,1600,1219,10
|
||||
0,0,1,307000,1910,330,930,15
|
||||
0,2,2,1735000,1980,1585,1095,18
|
||||
0,3,3,1850000,2005,1353,1367,8
|
||||
0,3,3,1850000,2005,1353,1367,10
|
||||
0,2,2,1995000,1986,1406,1419,14
|
||||
0,2,2,2900000,1987,1600,1813,24
|
||||
0,2,2,2499000,2004,1658,1507,24
|
||||
0,2,2,1575000,1930,1324,1190,33
|
||||
0,3,3,1495000,1990,1360,1099,0
|
||||
0,1,1,529000,1986,650,814,0
|
||||
0,3,3,2695000,2006,1991,1354,1
|
||||
0,3,3,2695000,2006,1991,1354,1
|
||||
0,4,3,3895000,2001,2277,1711,0
|
||||
1,1,1,550000,1982,724,760,24
|
||||
1,2,2,849000,1982,1030,824,24
|
||||
1,3,2,1750000,1900,2950,593,26
|
||||
1,1,1,799000,2008,847,943,29
|
||||
1,1,1.5,899000,1997,1453,619,3
|
||||
1,1,1,598000,2005,534,1120,5
|
||||
1,1,2,1088000,1998,1086,1002,6
|
||||
1,1,1,798000,1926,769,1038,10
|
||||
1,1,1,798000,1926,769,1038,10
|
||||
1,1,1,1495000,1927,2275,657,10
|
||||
1,4,4,4300000,2006,3321,1295,10
|
||||
1,1,1,699000,2008,756,925,12
|
||||
1,2,2,334905,2000,1047,320,12
|
||||
1,1,1.5,849000,1996,1127,753,13
|
||||
1,1,1.5,1365000,1996,1607,849,13
|
||||
1,1,1,649000,2011,674,963,14
|
||||
1,3,3,1245000,1907,1503,828,23
|
||||
1,1,1,649000,1983,850,764,163
|
||||
1,2,2,1700000,1987,1250,1360,11
|
||||
1,2,2,1980000,2009,1469,1348,0
|
||||
1,2,2,3600000,2009,1652,2179,0
|
||||
1,2,3,4995000,2009,2230,2240,0
|
||||
1,2,2,1298000,2008,1159,1120,0
|
||||
1,2,2,2149000,2008,1317,1632,0
|
||||
1,3,2,1995000,2006,1362,1465,3
|
||||
1,2,2,1650000,1937,1640,1006,7
|
||||
1,1,1,949000,2010,824,1152,8
|
||||
1,1,1,187518,2000,670,280,12
|
||||
1,3,2.5,1995000,2008,2354,847,23
|
||||
1,3,2.5,1995000,2008,2354,847,23
|
||||
1,2,2,2799000,2008,1328,2108,35
|
||||
1,2,2.5,1050000,2000,1640,640,2
|
||||
1,2,2,895000,2006,1113,804,3
|
||||
1,1,1,998000,2006,872,1144,3
|
||||
1,2,2,1659000,2000,1165,1424,4
|
||||
1,1,1,788000,2004,903,873,4
|
||||
1,2,2,1395000,2001,1334,1046,4
|
||||
1,2,2,1299000,2004,1453,894,5
|
||||
1,2,2.5,850000,2000,1136,748,5
|
||||
1,3,2.5,2850000,2013,2075,1373,5
|
||||
1,2,2,950000,2000,1258,755,6
|
||||
1,2,2,1600000,2002,1173,1364,7
|
||||
1,3,2,1275000,2001,1502,849,13
|
||||
1,1,1.5,775000,2009,835,928,14
|
||||
1,3,2,1399000,1892,1809,773,41
|
||||
1,2,2,879000,1912,950,925,53
|
||||
1,1,1,699000,1907,932,750,59
|
||||
1,1,1,985000,1978,884,1114,11
|
||||
1,1,1,725000,1978,1063,682,60
|
||||
1,2,2,849000,1911,1100,772,66
|
||||
1,1,1,618000,1973,705,877,72
|
||||
1,2,2,5950000,1989,3700,1608,83
|
||||
1,4,4,1800000,1948,2475,727,10
|
||||
1,2,2.5,2350000,2008,1314,1788,19
|
||||
1,1,1,740200,1920,989,748,41
|
||||
1,1,1,699000,1982,780,896,51
|
||||
1,2,2,958000,2005,915,1047,54
|
||||
1,2,2,1200000,1909,1302,922,56
|
||||
1,3,3,1575000,1993,2233,705,62
|
||||
1,2,2,1199000,1964,1100,1090,68
|
||||
1,1,1,529000,1966,791,669,69
|
||||
1,1,1,699000,1984,880,794,73
|
||||
1,3,2,1295000,1926,1675,773,77
|
||||
1,4,4,1800000,1948,2347,767,87
|
||||
1,1,1,795000,2000,990,803,7
|
||||
1,2,2,779000,2007,808,964,7
|
||||
1,1,1.5,378551,2000,1000,379,9
|
||||
1,4,2.3,1398000,1904,2492,561,13
|
||||
1,2,1.3,1097000,1904,1493,735,14
|
||||
1,1,1,670000,2014,507,1321,20
|
||||
1,2,2,1199000,2014,943,1271,20
|
||||
1,2,2,1095000,2010,1135,965,30
|
||||
1,1,2,795000,2014,616,1291,31
|
||||
1,3,2,979000,1900,1440,680,33
|
||||
1,3,3,1488888,1977,2100,709,38
|
||||
1,3,2,1389000,1908,1546,898,55
|
||||
1,3,2,1389000,1908,1546,898,55
|
||||
1,4,2,1099000,1962,1267,867,69
|
||||
1,3,3.5,1799000,1900,2449,735,75
|
||||
1,3,2,1250000,1924,1450,862,83
|
||||
1,3,2,998000,1907,1464,682,84
|
||||
1,3,2.5,1300000,1975,1642,792,4
|
||||
1,1,1,539000,2000,709,760,5
|
||||
1,1,1,735000,1983,779,944,12
|
||||
1,3,2,688000,1942,1441,477,34
|
||||
1,2,1,859000,1890,1100,781,44
|
||||
1,3,2,699000,1964,1250,559,48
|
||||
1,2,1,750000,1924,980,765,48
|
||||
1,3,2,549000,1915,1972,278,51
|
||||
1,6,3,1600000,1926,2567,623,52
|
||||
1,2,2,875000,1909,1700,515,54
|
||||
1,4,3,900000,2003,1870,481,54
|
||||
1,3,2,748000,1930,1522,491,54
|
||||
1,2,1.5,749000,1949,1348,556,56
|
||||
1,2,1,599000,1942,707,847,61
|
||||
1,3,1,648800,1940,1325,490,66
|
||||
1,2,2,649000,1924,1320,492,72
|
||||
1,2,1,798888,1929,1025,779,74
|
||||
1,5,2,899000,1972,1940,463,81
|
||||
1,3,1.5,699000,1910,1625,430,94
|
||||
1,3,2,749000,1907,1513,495,95
|
||||
1,3,2,535000,1967,1824,293,102
|
||||
1,2,1,699000,1913,900,777,108
|
||||
1,2,1,699000,1913,900,777,108
|
||||
1,2,1,500000,1958,1166,429,136
|
||||
1,3,2.5,879000,1981,2111,416,139
|
||||
1,3,2.5,879000,1981,2111,416,139
|
||||
1,4,2.5,699900,1982,1752,399,140
|
||||
1,2,1,688000,1950,1145,601,143
|
||||
1,3,2,1195000,1907,1396,856,33
|
||||
1,3,2,1195000,1907,1396,856,33
|
||||
1,1,1,697000,1922,694,1004,50
|
||||
1,5,3.5,3995000,1905,3350,1193,59
|
||||
1,1,1,650000,1955,600,1083,74
|
||||
1,1,1,650000,1955,600,1083,74
|
||||
1,1,1,775000,1955,600,1292,74
|
||||
1,1,1,825000,1955,600,1375,74
|
||||
1,2,1,699000,1907,915,764,89
|
||||
1,3,2,1495000,1927,1520,984,105
|
||||
1,4,4.5,5200000,1952,4813,1080,238
|
||||
1,2,2,989000,1984,988,1001,46
|
||||
1,3,1,1095000,1895,1465,747,68
|
||||
1,1,1,725000,1900,811,894,70
|
||||
1,1,1,725000,1900,811,894,70
|
||||
1,4,4,2650000,1900,3816,694,75
|
||||
1,2,2,1680000,1962,1850,908,89
|
||||
1,1,1,1199000,1906,1139,1053,91
|
||||
1,3,2.5,799000,1947,1400,571,35
|
||||
1,3,2,795000,1954,1350,589,36
|
||||
1,2,1,699000,1944,995,703,41
|
||||
1,3,2,848000,1940,1500,565,42
|
||||
1,3,2,899000,1948,1665,540,52
|
||||
1,3,2,899000,1948,1665,540,52
|
||||
1,6,4,1198000,1937,1965,610,79
|
||||
1,3,3,1100000,1925,2633,418,91
|
||||
1,6,3.5,949000,1918,2473,384,97
|
||||
1,5,5,2990000,2014,5000,598,108
|
||||
1,4,3,1338800,1940,2330,575,119
|
||||
1,3,2.5,1388000,1928,1905,729,160
|
||||
1,3,2,989000,1940,1603,617,227
|
||||
1,3,1,1295000,1890,1772,731,65
|
||||
1,6,6,6895000,1902,7800,884,67
|
||||
1,2,2,1725000,1922,1415,1219,86
|
||||
1,3,3,1995000,1922,1915,1042,88
|
||||
1,0,1,499000,1900,510,978,91
|
||||
1,0,1,499000,1900,510,978,91
|
||||
1,1,1,599000,1900,624,960,91
|
||||
1,1,1,599000,1900,624,960,91
|
||||
1,3,2,1200000,1929,1284,935,121
|
||||
1,3,2,1395000,1909,1877,743,57
|
||||
1,3,3,1785000,1925,1970,906,58
|
||||
1,3,2,1099000,1905,1457,754,67
|
||||
1,5,5.5,3495000,1921,4310,811,73
|
||||
1,3,2,2395000,1929,2323,1031,73
|
||||
1,4,2.5,2549000,1907,2746,928,75
|
||||
1,3,2,995000,2000,1393,714,75
|
||||
1,5,3.5,6495000,1906,4609,1409,89
|
||||
1,4,3,1698000,1890,1789,949,102
|
||||
1,5,2,6298000,1914,3585,1757,26
|
||||
1,3,3,1139000,2008,1532,743,44
|
||||
1,3,2,1080000,1914,1954,553,49
|
||||
1,4,1,1050000,1932,1767,594,55
|
||||
1,1,1,739900,1941,875,846,70
|
||||
1,4,4,1595000,1925,2750,580,81
|
||||
1,2,1,699000,1907,1200,583,14
|
||||
1,2,1,799000,1938,1150,695,36
|
||||
1,4,2.5,995000,1915,2180,456,62
|
||||
1,2,1,829000,1925,1145,724,71
|
||||
1,2,1,875000,1908,1158,756,84
|
||||
1,5,3,950000,1939,1846,515,97
|
||||
1,2,1,749000,1936,1450,517,110
|
||||
1,2,1,749000,1936,1450,517,110
|
||||
1,6,6.5,9500000,1937,5420,1753,3
|
||||
1,4,3,3595000,1931,3017,1192,9
|
||||
1,2,1.5,1425000,1925,1360,1048,14
|
||||
1,1,1,865000,1993,960,901,17
|
||||
1,2,2.5,2495000,1940,1809,1379,48
|
||||
1,2,2,2000000,1925,1518,1318,50
|
||||
1,4,3.5,9895000,2008,6024,1643,62
|
||||
1,3,2,358000,1989,1325,270,5
|
||||
1,3,2.5,899000,2015,1391,646,5
|
||||
1,3,2.5,929000,2015,1391,668,5
|
||||
1,4,4,850000,1928,2470,344,8
|
||||
1,2,1,648000,1921,1125,576,11
|
||||
1,2,1,480000,1915,680,706,13
|
||||
1,1,1,499000,1900,1076,464,19
|
||||
1,4,2,689000,1951,1473,468,21
|
||||
1,2,2,669000,1986,1317,508,23
|
||||
1,2,1.5,729000,1942,1012,720,27
|
||||
1,2,2,767000,1916,1380,556,31
|
||||
1,3,1,660000,1900,1520,434,35
|
||||
1,2,1,995000,1908,600,1658,36
|
||||
1,2,1,759900,1941,1175,647,43
|
||||
1,6,3.5,995000,2001,3080,323,55
|
||||
1,2,1,725000,1945,1040,697,60
|
||||
1,4,3,3420000,1926,5113,669,98
|
||||
1,3,2,1650000,1922,2025,815,106
|
||||
1,3,3.5,2250000,1928,3258,691,127
|
||||
1,3,1,1319000,1925,1752,753,141
|
||||
1,5,3.5,1698000,1966,2769,613,176
|
||||
1,3,2,1049000,1947,1626,645,179
|
||||
1,2,2,599000,1990,862,695,181
|
||||
1,5,3.5,2995000,1947,3890,770,181
|
||||
1,3,2,995000,1956,1305,762,216
|
||||
1,1,1,350000,1908,600,583,43
|
||||
1,2,1,550000,1908,800,688,43
|
||||
1,4,4,3760000,1900,3085,1219,49
|
||||
1,3,2,1050000,1922,1266,829,52
|
||||
1,2,2,1895000,1907,1756,1079,54
|
||||
1,1,1,599000,1961,680,881,56
|
||||
1,4,3,1895000,2001,2041,928,61
|
||||
1,3,2,1799000,1926,1800,999,66
|
||||
1,2,1,600000,1908,1350,444,92
|
||||
1,2,1,1495000,1908,1700,879,98
|
||||
1,3,2,1595000,1961,1515,1053,103
|
||||
1,3,2,849000,1947,1622,523,106
|
||||
1,4,3.5,1995000,1992,3312,602,108
|
||||
1,3,2,1495000,1937,1635,914,112
|
||||
1,3,3.5,2195000,1922,2168,1012,125
|
||||
1,4,2,1798000,1951,2050,877,131
|
||||
1,2,2,849000,1978,1555,546,139
|
||||
1,3,2,1159000,1977,1731,670,143
|
||||
1,3,2.5,995000,1976,1959,508,163
|
||||
1,4,3,1388000,1968,2275,610,163
|
||||
1,5,3.5,2250000,1962,3729,603,174
|
||||
1,3,2,1080000,1989,1524,709,185
|
||||
1,3,2.5,1095000,1968,1868,586,187
|
||||
1,2,1,599000,1972,990,605,189
|
||||
1,2,1,915000,1954,1251,731,24
|
||||
1,2,1,915000,1954,1251,731,24
|
||||
1,3,2,725000,1975,1474,492,34
|
||||
1,3,2.5,1588000,2015,2001,794,43
|
||||
1,2,1,795000,1941,1256,633,63
|
||||
1,2,1,795000,1941,1256,633,63
|
||||
1,4,2,848000,1949,1646,515,69
|
||||
1,1,1,439000,2002,667,658,80
|
||||
1,3,2,849900,1958,1310,649,118
|
||||
1,2,1,599000,1941,1254,478,123
|
||||
1,3,2.5,1539514,2014,2024,761,136
|
||||
1,3,2.5,1339000,2015,2133,628,143
|
||||
1,3,2.5,1294000,2015,2133,607,143
|
||||
1,3,2.5,1611000,2015,2001,805,153
|
||||
1,2,2,1495000,1913,1174,1273,35
|
||||
1,1,1,699000,1908,750,932,36
|
||||
1,2,2.5,3495000,1900,1968,1776,76
|
||||
1,4,2,699000,1949,1550,451,11
|
||||
1,2,1,699000,1949,1050,666,64
|
||||
1,3,3,888000,1975,1555,571,79
|
||||
1,1,1,599000,1945,631,949,84
|
||||
1,3,3,758000,1989,2157,351,90
|
||||
1,2,2,1698000,2008,1620,1048,1
|
||||
1,2,2,1698000,2008,1620,1048,1
|
||||
1,1,1,849000,2012,886,958,2
|
||||
1,2,2,1675000,2012,1562,1072,2
|
||||
1,2,2,1695000,2007,1610,1053,2
|
||||
1,3,2,2219000,2012,1921,1155,13
|
||||
1,1,1,788000,2004,903,873,4
|
||||
1,2,2,1950000,1995,1930,1010,4
|
||||
1,0,1,539000,2000,709,760,5
|
||||
1,2,2,849000,1982,1030,824,24
|
||||
1,2,2.5,2495000,1940,1809,1379,48
|
||||
1,4,4,3760000,1894,3085,1219,49
|
||||
1,3,2,1799000,1926,1800,999,66
|
||||
1,5,2.5,1800000,1890,3073,586,76
|
||||
1,2,1,695000,1923,1045,665,106
|
||||
1,3,2,1650000,1922,1483,1113,106
|
||||
1,1,1,649000,1983,850,764,163
|
||||
1,3,2,995000,1956,1305,762,216
|
|
208
Labo7.ipynb
Normal file
208
Labo7.ipynb
Normal file
@@ -0,0 +1,208 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Labo 7 Data Science: Decision Trees"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<div class=\"alert alert-block alert-warning\">\n",
|
||||
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 & 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
|
||||
"</div>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
|
||||
"\n",
|
||||
"**1.1 De data verkennen**\n",
|
||||
"\n",
|
||||
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
|
||||
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
|
||||
" * beds: het aantal bedden\n",
|
||||
" * bath: het aantal baden\n",
|
||||
" * price: de verkoopprijs (\\$)\n",
|
||||
" * year_built: het bouwjaar\n",
|
||||
" * sqft: de oppervlakte in square foot\n",
|
||||
" * price_per_sqft: de prijs (\\$) per square foot\n",
|
||||
" * elevation: hoogte in m\n",
|
||||
"\n",
|
||||
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
|
||||
"\n",
|
||||
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
|
||||
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
|
||||
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**1.2 Training en parameter tuning**\n",
|
||||
"\n",
|
||||
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state ≠ 0 bv. 88\n",
|
||||
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
|
||||
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"**2.1 Generatie van de sample data**\n",
|
||||
"\n",
|
||||
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
|
||||
"\n",
|
||||
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
|
||||
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
|
||||
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
|
||||
" \n",
|
||||
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
|
||||
"\n",
|
||||
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
|
||||
"\n",
|
||||
"* Deel de data in in een training- en testset (70%/30%)\n",
|
||||
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
|
||||
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
|
||||
"\n",
|
||||
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
|
||||
"\n",
|
||||
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
|
||||
"\n",
|
||||
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**2.3 Random forests en gradient boosting machines**\n",
|
||||
"\n",
|
||||
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
|
||||
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
|
||||
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
495
housing-ny-sf.csv
Normal file
495
housing-ny-sf.csv
Normal file
@@ -0,0 +1,495 @@
|
||||
# This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
|
||||
#
|
||||
in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
|
||||
0,2,1,999000,1960,1000,999,10
|
||||
0,2,2,2750000,2006,1418,1939,0
|
||||
0,2,2,1350000,1900,2150,628,9
|
||||
0,1,1,629000,1903,500,1258,9
|
||||
0,0,1,439000,1930,500,878,10
|
||||
0,0,1,439000,1930,500,878,10
|
||||
0,1,1,475000,1920,500,950,10
|
||||
0,1,1,975000,1930,900,1083,10
|
||||
0,1,1,975000,1930,900,1083,12
|
||||
0,2,1,1895000,1921,1000,1895,12
|
||||
0,3,3,2095000,1926,2200,952,4
|
||||
0,1,1,999000,1982,784,1274,5
|
||||
0,1,1,999000,1982,784,1274,5
|
||||
0,1,1,1249000,1987,826,1512,3
|
||||
0,0,1,1110000,2008,698,1590,5
|
||||
0,2,2,2059500,2008,1373,1500,5
|
||||
0,2,2,2000000,1928,1200,1667,10
|
||||
0,1,1,715000,1903,557,1284,3
|
||||
0,2,2,2498000,2005,1260,1983,2
|
||||
0,2,1.5,2650000,1915,2500,1060,5
|
||||
0,2,2,3450000,1900,1850,1865,9
|
||||
0,1,1,3105000,2016,1108,2802,10
|
||||
0,4,5,13750000,2016,3699,3717,10
|
||||
0,2,2,1185000,1900,1000,1185,4
|
||||
0,1,2,1699000,1900,1500,1133,5
|
||||
0,1,1,1195000,1900,1093,1093,6
|
||||
0,1,1,450000,1920,500,900,10
|
||||
0,1,1,1195000,1900,1093,1093,10
|
||||
0,2,2,1185000,1900,1000,1185,10
|
||||
0,0,1,625000,1964,800,781,8
|
||||
0,2,2,3159990,2012,1420,2225,8
|
||||
0,2,2,2200000,1920,2050,1073,10
|
||||
0,2,2,4895000,2016,1520,3220,10
|
||||
0,2,2,6525000,2016,2018,3233,10
|
||||
0,2,2,4895000,2016,1520,3220,11
|
||||
0,2,2,6525000,2016,2018,3233,11
|
||||
0,2,2,1675000,2006,1225,1367,12
|
||||
0,2,2,1675000,2006,1225,1367,12
|
||||
0,2,1,999000,1985,799,1250,5
|
||||
0,1,1,1550000,1926,1000,1550,6
|
||||
0,1,1,1595000,2015,714,2234,7
|
||||
0,2,2,3995000,1906,2400,1665,9
|
||||
0,1,1,1285000,2012,749,1716,10
|
||||
0,1,1,1595000,2015,714,2234,10
|
||||
0,2,2,3995000,1906,2400,1665,10
|
||||
0,2,2,4995000,1900,2024,2468,8
|
||||
0,3,2,3580000,1880,3000,1193,10
|
||||
0,3,3,6350000,2015,2500,2540,3
|
||||
0,3,3,6550000,2015,2500,2620,3
|
||||
0,3,3,5985000,1909,3300,1814,3
|
||||
0,3,3,9900000,2015,2950,3356,4
|
||||
0,1,1,1849000,1920,1400,1321,5
|
||||
0,2,2,3500000,1915,2000,1750,5
|
||||
0,2,2,3500000,1910,1887,1855,5
|
||||
0,1,1,1250000,2007,720,1736,6
|
||||
0,2,1,899000,1990,820,1096,7
|
||||
0,1,2,1950000,1900,1300,1500,7
|
||||
0,1,1,1750000,1963,1000,1750,5
|
||||
0,0,1,775000,2009,546,1419,6
|
||||
0,0,1,390000,1955,550,709,6
|
||||
0,0,1,699999,1988,500,1400,9
|
||||
0,1,1,649000,1965,750,865,10
|
||||
0,2,2,1200000,1964,1200,1000,10
|
||||
0,0,1,319000,1941,500,638,10
|
||||
0,0,1,699999,1988,500,1400,10
|
||||
0,0,1,775000,2009,546,1419,10
|
||||
0,1,1,649000,1965,750,865,10
|
||||
0,2,2,1849000,2009,1135,1629,10
|
||||
0,1,1,615000,1960,750,820,11
|
||||
0,1,1,1590000,2008,785,2025,12
|
||||
0,1,1,1150000,2014,645,1783,18
|
||||
0,2,2,2650000,2014,1240,2137,18
|
||||
0,0,1,469000,1932,500,938,18
|
||||
0,0,1,569000,1962,543,1048,8
|
||||
0,1,1,549000,1924,750,732,10
|
||||
0,2,2,799000,1928,850,940,19
|
||||
0,2,2,879000,1928,800,1099,22
|
||||
0,2,3,2999000,1925,3200,937,10
|
||||
0,2,3,2999000,1925,3200,937,10
|
||||
0,2,3,2999000,1925,3200,937,12
|
||||
0,2,3,2999000,1925,3200,937,12
|
||||
0,1,1,399000,1910,475,840,10
|
||||
0,1,1,1135000,2005,715,1587,10
|
||||
0,1,1,1145000,2005,700,1636,10
|
||||
0,1,1,1760000,1969,950,1853,10
|
||||
0,2,2,1725000,2005,976,1767,10
|
||||
0,2,2,2750000,2007,1384,1987,10
|
||||
0,2,2,2750000,2007,1384,1987,13
|
||||
0,1,1,399000,1910,475,840,14
|
||||
0,2,2,1725000,2005,1144,1508,16
|
||||
0,1,1,869000,1988,670,1297,16
|
||||
0,1,1,1760000,1969,950,1853,17
|
||||
0,3,3,1249000,1962,1500,833,18
|
||||
0,3,3,1500000,1962,1600,938,18
|
||||
0,1,1,1135000,2005,715,1587,19
|
||||
0,1,1,1145000,2005,700,1636,19
|
||||
0,2,2,1725000,2005,976,1767,19
|
||||
0,2,2,3500000,1925,1550,2258,20
|
||||
0,0,1,525000,1940,525,1000,21
|
||||
0,2,2,2025000,1940,1433,1413,21
|
||||
0,2,2,3500000,1986,1463,2392,23
|
||||
0,0,1,925000,1978,585,1581,24
|
||||
0,2,2,1700000,1982,1007,1688,25
|
||||
0,2,2,1700000,1982,1007,1688,25
|
||||
0,0,1,449000,1962,550,816,10
|
||||
0,0,1,299000,1930,400,748,12
|
||||
0,1,1,695000,1961,720,965,16
|
||||
0,1,1,695000,1961,720,965,16
|
||||
0,4,5,17750000,2012,4476,3966,20
|
||||
0,5,5,27500000,1930,7500,3667,21
|
||||
0,0,1,539000,1957,485,1111,10
|
||||
0,0,1,779000,1975,512,1521,10
|
||||
0,1,1,925000,1985,745,1242,10
|
||||
0,1,1,1319000,1958,1000,1319,10
|
||||
0,2,2,4240000,2016,1741,2435,10
|
||||
0,2,2,4285000,2016,1747,2453,10
|
||||
0,3,3,4875000,2016,2017,2417,10
|
||||
0,3,3,14950000,1931,4435,3371,10
|
||||
0,3,3,10225000,2016,3007,3400,10
|
||||
0,4,4,13400000,2016,3331,4023,10
|
||||
0,5,5,19000000,2016,4972,3821,10
|
||||
0,0,1,539000,1957,485,1111,11
|
||||
0,1,1,835000,1963,700,1193,12
|
||||
0,10,10,7995000,1910,6400,1249,12
|
||||
0,0,1,349000,1960,400,873,13
|
||||
0,1,1,588000,1970,600,980,14
|
||||
0,2,2,8690000,2002,2178,3990,15
|
||||
0,2,2,4195000,2016,1750,2397,15
|
||||
0,2,2,4195000,2016,1742,2408,15
|
||||
0,2,2,4240000,2016,1741,2435,15
|
||||
0,2,2,4285000,2016,1747,2453,15
|
||||
0,3,3,4875000,2016,2017,2417,15
|
||||
0,3,3,5750000,2016,2196,2618,15
|
||||
0,3,3,9350000,2016,3054,3062,15
|
||||
0,3,3,10225000,2016,3007,3400,15
|
||||
0,4,4,13150000,2016,3338,3939,15
|
||||
0,4,4,13400000,2016,3331,4023,15
|
||||
0,5,5,19000000,2016,4972,3821,15
|
||||
0,0,1,850000,1924,546,1557,10
|
||||
0,7,7,19500000,1994,4238,4601,10
|
||||
0,1,1,1000000,1912,612,1634,18
|
||||
0,0,1,485000,1902,310,1565,23
|
||||
0,3,2,7350000,1999,2075,3542,23
|
||||
0,1,1,1200000,1969,600,2000,23
|
||||
0,2,2,4250000,2011,1504,2826,23
|
||||
0,2,2,3600000,1960,1340,2687,24
|
||||
0,2,2,3600000,1960,1340,2687,24
|
||||
0,1,1,525000,1924,700,750,24
|
||||
0,0,1,545000,1900,387,1408,10
|
||||
0,1,1,535000,1900,450,1189,10
|
||||
0,4,4,5500000,1923,3200,1719,21
|
||||
0,4,5,12000000,1939,3700,3243,22
|
||||
0,1,2,1850000,2007,839,2205,23
|
||||
0,1,1,535000,1900,450,1189,27
|
||||
0,2,2,1350000,1931,1300,1038,36
|
||||
0,1,1,779000,1902,700,1113,10
|
||||
0,2,1,1475000,1973,971,1519,10
|
||||
0,2,2,1385000,1971,962,1440,10
|
||||
0,1,1,779000,1902,700,1113,13
|
||||
0,1,1,649000,1929,800,811,25
|
||||
0,0,1,725000,1960,600,1208,26
|
||||
0,2,1,925000,1941,790,1171,27
|
||||
0,1,1,965000,1961,787,1226,27
|
||||
0,2,1,1250000,1922,1145,1092,30
|
||||
0,1,1,650000,1907,720,903,32
|
||||
0,2,1,385000,1999,820,470,8
|
||||
0,2,2,1695000,2012,1051,1613,16
|
||||
0,1,1,600000,1899,1060,566,8
|
||||
0,2,1,910000,1899,1060,858,8
|
||||
0,8,7,2300000,1910,4180,550,9
|
||||
0,1,1,600000,1899,1060,566,10
|
||||
0,2,1,910000,1899,1060,858,10
|
||||
0,2,2,1599000,1973,1400,1142,10
|
||||
0,2,2,4625000,1987,1695,2729,10
|
||||
0,0,1,325000,1910,375,867,11
|
||||
0,2,2,1599000,1973,1400,1142,14
|
||||
0,2,2,1965000,2005,1330,1477,16
|
||||
0,1,1,735000,1928,800,919,22
|
||||
0,2,2,4625000,1987,1695,2729,29
|
||||
0,1,1,749000,2011,762,983,10
|
||||
0,1,1,499999,2011,669,747,35
|
||||
0,1,1,749000,2011,762,983,35
|
||||
0,2,2,904000,1920,1503,601,10
|
||||
0,2,2,904000,1920,1503,601,10
|
||||
0,2,1,559900,1925,1200,467,51
|
||||
0,2,1,545000,1939,1049,520,39
|
||||
0,2,1,365000,1925,700,521,73
|
||||
0,2,1,365000,1925,700,521,73
|
||||
0,1,1,935000,1910,1102,848,8
|
||||
0,0,1,820000,2013,533,1538,8
|
||||
0,0,1,835000,2013,501,1667,8
|
||||
0,1,1,935000,1910,1102,848,10
|
||||
0,1,1,1420000,2004,768,1849,10
|
||||
0,1,1,1550000,2004,794,1952,10
|
||||
0,2,2,1635000,2004,957,1708,10
|
||||
0,1,1,1550000,2004,794,1952,11
|
||||
0,1,1,1780000,2007,988,1802,14
|
||||
0,2,2,2800000,2007,1308,2141,14
|
||||
0,1,1,411500,1921,586,702,6
|
||||
0,2,2,2175000,1999,1569,1386,10
|
||||
0,2,2,1995000,1996,1044,1911,10
|
||||
0,2,2,2235000,1999,1548,1444,14
|
||||
0,4,4,8800000,1941,3382,2602,15
|
||||
0,2,2,1850000,1996,1044,1772,17
|
||||
0,2,2,1995000,1996,1044,1911,17
|
||||
0,1,1,1695000,1927,680,2493,17
|
||||
0,1,1,1495000,1962,1125,1329,19
|
||||
0,2,2,2200000,2000,1044,2107,2
|
||||
0,0.5,1,384900,1962,540,713,10
|
||||
0,1,1,515000,1962,725,710,10
|
||||
0,3,2,1950000,1956,1600,1219,10
|
||||
0,0,1,307000,1910,330,930,15
|
||||
0,2,2,1735000,1980,1585,1095,18
|
||||
0,3,3,1850000,2005,1353,1367,8
|
||||
0,3,3,1850000,2005,1353,1367,10
|
||||
0,2,2,1995000,1986,1406,1419,14
|
||||
0,2,2,2900000,1987,1600,1813,24
|
||||
0,2,2,2499000,2004,1658,1507,24
|
||||
0,2,2,1575000,1930,1324,1190,33
|
||||
0,3,3,1495000,1990,1360,1099,0
|
||||
0,1,1,529000,1986,650,814,0
|
||||
0,3,3,2695000,2006,1991,1354,1
|
||||
0,3,3,2695000,2006,1991,1354,1
|
||||
0,4,3,3895000,2001,2277,1711,0
|
||||
1,1,1,550000,1982,724,760,24
|
||||
1,2,2,849000,1982,1030,824,24
|
||||
1,3,2,1750000,1900,2950,593,26
|
||||
1,1,1,799000,2008,847,943,29
|
||||
1,1,1.5,899000,1997,1453,619,3
|
||||
1,1,1,598000,2005,534,1120,5
|
||||
1,1,2,1088000,1998,1086,1002,6
|
||||
1,1,1,798000,1926,769,1038,10
|
||||
1,1,1,798000,1926,769,1038,10
|
||||
1,1,1,1495000,1927,2275,657,10
|
||||
1,4,4,4300000,2006,3321,1295,10
|
||||
1,1,1,699000,2008,756,925,12
|
||||
1,2,2,334905,2000,1047,320,12
|
||||
1,1,1.5,849000,1996,1127,753,13
|
||||
1,1,1.5,1365000,1996,1607,849,13
|
||||
1,1,1,649000,2011,674,963,14
|
||||
1,3,3,1245000,1907,1503,828,23
|
||||
1,1,1,649000,1983,850,764,163
|
||||
1,2,2,1700000,1987,1250,1360,11
|
||||
1,2,2,1980000,2009,1469,1348,0
|
||||
1,2,2,3600000,2009,1652,2179,0
|
||||
1,2,3,4995000,2009,2230,2240,0
|
||||
1,2,2,1298000,2008,1159,1120,0
|
||||
1,2,2,2149000,2008,1317,1632,0
|
||||
1,3,2,1995000,2006,1362,1465,3
|
||||
1,2,2,1650000,1937,1640,1006,7
|
||||
1,1,1,949000,2010,824,1152,8
|
||||
1,1,1,187518,2000,670,280,12
|
||||
1,3,2.5,1995000,2008,2354,847,23
|
||||
1,3,2.5,1995000,2008,2354,847,23
|
||||
1,2,2,2799000,2008,1328,2108,35
|
||||
1,2,2.5,1050000,2000,1640,640,2
|
||||
1,2,2,895000,2006,1113,804,3
|
||||
1,1,1,998000,2006,872,1144,3
|
||||
1,2,2,1659000,2000,1165,1424,4
|
||||
1,1,1,788000,2004,903,873,4
|
||||
1,2,2,1395000,2001,1334,1046,4
|
||||
1,2,2,1299000,2004,1453,894,5
|
||||
1,2,2.5,850000,2000,1136,748,5
|
||||
1,3,2.5,2850000,2013,2075,1373,5
|
||||
1,2,2,950000,2000,1258,755,6
|
||||
1,2,2,1600000,2002,1173,1364,7
|
||||
1,3,2,1275000,2001,1502,849,13
|
||||
1,1,1.5,775000,2009,835,928,14
|
||||
1,3,2,1399000,1892,1809,773,41
|
||||
1,2,2,879000,1912,950,925,53
|
||||
1,1,1,699000,1907,932,750,59
|
||||
1,1,1,985000,1978,884,1114,11
|
||||
1,1,1,725000,1978,1063,682,60
|
||||
1,2,2,849000,1911,1100,772,66
|
||||
1,1,1,618000,1973,705,877,72
|
||||
1,2,2,5950000,1989,3700,1608,83
|
||||
1,4,4,1800000,1948,2475,727,10
|
||||
1,2,2.5,2350000,2008,1314,1788,19
|
||||
1,1,1,740200,1920,989,748,41
|
||||
1,1,1,699000,1982,780,896,51
|
||||
1,2,2,958000,2005,915,1047,54
|
||||
1,2,2,1200000,1909,1302,922,56
|
||||
1,3,3,1575000,1993,2233,705,62
|
||||
1,2,2,1199000,1964,1100,1090,68
|
||||
1,1,1,529000,1966,791,669,69
|
||||
1,1,1,699000,1984,880,794,73
|
||||
1,3,2,1295000,1926,1675,773,77
|
||||
1,4,4,1800000,1948,2347,767,87
|
||||
1,1,1,795000,2000,990,803,7
|
||||
1,2,2,779000,2007,808,964,7
|
||||
1,1,1.5,378551,2000,1000,379,9
|
||||
1,4,2.3,1398000,1904,2492,561,13
|
||||
1,2,1.3,1097000,1904,1493,735,14
|
||||
1,1,1,670000,2014,507,1321,20
|
||||
1,2,2,1199000,2014,943,1271,20
|
||||
1,2,2,1095000,2010,1135,965,30
|
||||
1,1,2,795000,2014,616,1291,31
|
||||
1,3,2,979000,1900,1440,680,33
|
||||
1,3,3,1488888,1977,2100,709,38
|
||||
1,3,2,1389000,1908,1546,898,55
|
||||
1,3,2,1389000,1908,1546,898,55
|
||||
1,4,2,1099000,1962,1267,867,69
|
||||
1,3,3.5,1799000,1900,2449,735,75
|
||||
1,3,2,1250000,1924,1450,862,83
|
||||
1,3,2,998000,1907,1464,682,84
|
||||
1,3,2.5,1300000,1975,1642,792,4
|
||||
1,1,1,539000,2000,709,760,5
|
||||
1,1,1,735000,1983,779,944,12
|
||||
1,3,2,688000,1942,1441,477,34
|
||||
1,2,1,859000,1890,1100,781,44
|
||||
1,3,2,699000,1964,1250,559,48
|
||||
1,2,1,750000,1924,980,765,48
|
||||
1,3,2,549000,1915,1972,278,51
|
||||
1,6,3,1600000,1926,2567,623,52
|
||||
1,2,2,875000,1909,1700,515,54
|
||||
1,4,3,900000,2003,1870,481,54
|
||||
1,3,2,748000,1930,1522,491,54
|
||||
1,2,1.5,749000,1949,1348,556,56
|
||||
1,2,1,599000,1942,707,847,61
|
||||
1,3,1,648800,1940,1325,490,66
|
||||
1,2,2,649000,1924,1320,492,72
|
||||
1,2,1,798888,1929,1025,779,74
|
||||
1,5,2,899000,1972,1940,463,81
|
||||
1,3,1.5,699000,1910,1625,430,94
|
||||
1,3,2,749000,1907,1513,495,95
|
||||
1,3,2,535000,1967,1824,293,102
|
||||
1,2,1,699000,1913,900,777,108
|
||||
1,2,1,699000,1913,900,777,108
|
||||
1,2,1,500000,1958,1166,429,136
|
||||
1,3,2.5,879000,1981,2111,416,139
|
||||
1,3,2.5,879000,1981,2111,416,139
|
||||
1,4,2.5,699900,1982,1752,399,140
|
||||
1,2,1,688000,1950,1145,601,143
|
||||
1,3,2,1195000,1907,1396,856,33
|
||||
1,3,2,1195000,1907,1396,856,33
|
||||
1,1,1,697000,1922,694,1004,50
|
||||
1,5,3.5,3995000,1905,3350,1193,59
|
||||
1,1,1,650000,1955,600,1083,74
|
||||
1,1,1,650000,1955,600,1083,74
|
||||
1,1,1,775000,1955,600,1292,74
|
||||
1,1,1,825000,1955,600,1375,74
|
||||
1,2,1,699000,1907,915,764,89
|
||||
1,3,2,1495000,1927,1520,984,105
|
||||
1,4,4.5,5200000,1952,4813,1080,238
|
||||
1,2,2,989000,1984,988,1001,46
|
||||
1,3,1,1095000,1895,1465,747,68
|
||||
1,1,1,725000,1900,811,894,70
|
||||
1,1,1,725000,1900,811,894,70
|
||||
1,4,4,2650000,1900,3816,694,75
|
||||
1,2,2,1680000,1962,1850,908,89
|
||||
1,1,1,1199000,1906,1139,1053,91
|
||||
1,3,2.5,799000,1947,1400,571,35
|
||||
1,3,2,795000,1954,1350,589,36
|
||||
1,2,1,699000,1944,995,703,41
|
||||
1,3,2,848000,1940,1500,565,42
|
||||
1,3,2,899000,1948,1665,540,52
|
||||
1,3,2,899000,1948,1665,540,52
|
||||
1,6,4,1198000,1937,1965,610,79
|
||||
1,3,3,1100000,1925,2633,418,91
|
||||
1,6,3.5,949000,1918,2473,384,97
|
||||
1,5,5,2990000,2014,5000,598,108
|
||||
1,4,3,1338800,1940,2330,575,119
|
||||
1,3,2.5,1388000,1928,1905,729,160
|
||||
1,3,2,989000,1940,1603,617,227
|
||||
1,3,1,1295000,1890,1772,731,65
|
||||
1,6,6,6895000,1902,7800,884,67
|
||||
1,2,2,1725000,1922,1415,1219,86
|
||||
1,3,3,1995000,1922,1915,1042,88
|
||||
1,0,1,499000,1900,510,978,91
|
||||
1,0,1,499000,1900,510,978,91
|
||||
1,1,1,599000,1900,624,960,91
|
||||
1,1,1,599000,1900,624,960,91
|
||||
1,3,2,1200000,1929,1284,935,121
|
||||
1,3,2,1395000,1909,1877,743,57
|
||||
1,3,3,1785000,1925,1970,906,58
|
||||
1,3,2,1099000,1905,1457,754,67
|
||||
1,5,5.5,3495000,1921,4310,811,73
|
||||
1,3,2,2395000,1929,2323,1031,73
|
||||
1,4,2.5,2549000,1907,2746,928,75
|
||||
1,3,2,995000,2000,1393,714,75
|
||||
1,5,3.5,6495000,1906,4609,1409,89
|
||||
1,4,3,1698000,1890,1789,949,102
|
||||
1,5,2,6298000,1914,3585,1757,26
|
||||
1,3,3,1139000,2008,1532,743,44
|
||||
1,3,2,1080000,1914,1954,553,49
|
||||
1,4,1,1050000,1932,1767,594,55
|
||||
1,1,1,739900,1941,875,846,70
|
||||
1,4,4,1595000,1925,2750,580,81
|
||||
1,2,1,699000,1907,1200,583,14
|
||||
1,2,1,799000,1938,1150,695,36
|
||||
1,4,2.5,995000,1915,2180,456,62
|
||||
1,2,1,829000,1925,1145,724,71
|
||||
1,2,1,875000,1908,1158,756,84
|
||||
1,5,3,950000,1939,1846,515,97
|
||||
1,2,1,749000,1936,1450,517,110
|
||||
1,2,1,749000,1936,1450,517,110
|
||||
1,6,6.5,9500000,1937,5420,1753,3
|
||||
1,4,3,3595000,1931,3017,1192,9
|
||||
1,2,1.5,1425000,1925,1360,1048,14
|
||||
1,1,1,865000,1993,960,901,17
|
||||
1,2,2.5,2495000,1940,1809,1379,48
|
||||
1,2,2,2000000,1925,1518,1318,50
|
||||
1,4,3.5,9895000,2008,6024,1643,62
|
||||
1,3,2,358000,1989,1325,270,5
|
||||
1,3,2.5,899000,2015,1391,646,5
|
||||
1,3,2.5,929000,2015,1391,668,5
|
||||
1,4,4,850000,1928,2470,344,8
|
||||
1,2,1,648000,1921,1125,576,11
|
||||
1,2,1,480000,1915,680,706,13
|
||||
1,1,1,499000,1900,1076,464,19
|
||||
1,4,2,689000,1951,1473,468,21
|
||||
1,2,2,669000,1986,1317,508,23
|
||||
1,2,1.5,729000,1942,1012,720,27
|
||||
1,2,2,767000,1916,1380,556,31
|
||||
1,3,1,660000,1900,1520,434,35
|
||||
1,2,1,995000,1908,600,1658,36
|
||||
1,2,1,759900,1941,1175,647,43
|
||||
1,6,3.5,995000,2001,3080,323,55
|
||||
1,2,1,725000,1945,1040,697,60
|
||||
1,4,3,3420000,1926,5113,669,98
|
||||
1,3,2,1650000,1922,2025,815,106
|
||||
1,3,3.5,2250000,1928,3258,691,127
|
||||
1,3,1,1319000,1925,1752,753,141
|
||||
1,5,3.5,1698000,1966,2769,613,176
|
||||
1,3,2,1049000,1947,1626,645,179
|
||||
1,2,2,599000,1990,862,695,181
|
||||
1,5,3.5,2995000,1947,3890,770,181
|
||||
1,3,2,995000,1956,1305,762,216
|
||||
1,1,1,350000,1908,600,583,43
|
||||
1,2,1,550000,1908,800,688,43
|
||||
1,4,4,3760000,1900,3085,1219,49
|
||||
1,3,2,1050000,1922,1266,829,52
|
||||
1,2,2,1895000,1907,1756,1079,54
|
||||
1,1,1,599000,1961,680,881,56
|
||||
1,4,3,1895000,2001,2041,928,61
|
||||
1,3,2,1799000,1926,1800,999,66
|
||||
1,2,1,600000,1908,1350,444,92
|
||||
1,2,1,1495000,1908,1700,879,98
|
||||
1,3,2,1595000,1961,1515,1053,103
|
||||
1,3,2,849000,1947,1622,523,106
|
||||
1,4,3.5,1995000,1992,3312,602,108
|
||||
1,3,2,1495000,1937,1635,914,112
|
||||
1,3,3.5,2195000,1922,2168,1012,125
|
||||
1,4,2,1798000,1951,2050,877,131
|
||||
1,2,2,849000,1978,1555,546,139
|
||||
1,3,2,1159000,1977,1731,670,143
|
||||
1,3,2.5,995000,1976,1959,508,163
|
||||
1,4,3,1388000,1968,2275,610,163
|
||||
1,5,3.5,2250000,1962,3729,603,174
|
||||
1,3,2,1080000,1989,1524,709,185
|
||||
1,3,2.5,1095000,1968,1868,586,187
|
||||
1,2,1,599000,1972,990,605,189
|
||||
1,2,1,915000,1954,1251,731,24
|
||||
1,2,1,915000,1954,1251,731,24
|
||||
1,3,2,725000,1975,1474,492,34
|
||||
1,3,2.5,1588000,2015,2001,794,43
|
||||
1,2,1,795000,1941,1256,633,63
|
||||
1,2,1,795000,1941,1256,633,63
|
||||
1,4,2,848000,1949,1646,515,69
|
||||
1,1,1,439000,2002,667,658,80
|
||||
1,3,2,849900,1958,1310,649,118
|
||||
1,2,1,599000,1941,1254,478,123
|
||||
1,3,2.5,1539514,2014,2024,761,136
|
||||
1,3,2.5,1339000,2015,2133,628,143
|
||||
1,3,2.5,1294000,2015,2133,607,143
|
||||
1,3,2.5,1611000,2015,2001,805,153
|
||||
1,2,2,1495000,1913,1174,1273,35
|
||||
1,1,1,699000,1908,750,932,36
|
||||
1,2,2.5,3495000,1900,1968,1776,76
|
||||
1,4,2,699000,1949,1550,451,11
|
||||
1,2,1,699000,1949,1050,666,64
|
||||
1,3,3,888000,1975,1555,571,79
|
||||
1,1,1,599000,1945,631,949,84
|
||||
1,3,3,758000,1989,2157,351,90
|
||||
1,2,2,1698000,2008,1620,1048,1
|
||||
1,2,2,1698000,2008,1620,1048,1
|
||||
1,1,1,849000,2012,886,958,2
|
||||
1,2,2,1675000,2012,1562,1072,2
|
||||
1,2,2,1695000,2007,1610,1053,2
|
||||
1,3,2,2219000,2012,1921,1155,13
|
||||
1,1,1,788000,2004,903,873,4
|
||||
1,2,2,1950000,1995,1930,1010,4
|
||||
1,0,1,539000,2000,709,760,5
|
||||
1,2,2,849000,1982,1030,824,24
|
||||
1,2,2.5,2495000,1940,1809,1379,48
|
||||
1,4,4,3760000,1894,3085,1219,49
|
||||
1,3,2,1799000,1926,1800,999,66
|
||||
1,5,2.5,1800000,1890,3073,586,76
|
||||
1,2,1,695000,1923,1045,665,106
|
||||
1,3,2,1650000,1922,1483,1113,106
|
||||
1,1,1,649000,1983,850,764,163
|
||||
1,3,2,995000,1956,1305,762,216
|
|
File diff suppressed because one or more lines are too long
1190
theorie/.ipynb_checkpoints/UnsuperVised(2)-checkpoint.ipynb
Normal file
1190
theorie/.ipynb_checkpoints/UnsuperVised(2)-checkpoint.ipynb
Normal file
File diff suppressed because one or more lines are too long
1190
theorie/UnsuperVised(2).ipynb
Normal file
1190
theorie/UnsuperVised(2).ipynb
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user