This commit is contained in:
2019-05-25 23:27:05 +02:00
parent d981556715
commit e70355b935
9 changed files with 44686 additions and 37 deletions

View File

@@ -97,15 +97,14 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"392\n",
"392\n"
"<class 'sklearn.neighbors.classification.KNeighborsClassifier'>\n"
]
},
{
@@ -117,18 +116,6 @@
"/home/beppe/.local/lib/python3.7/site-packages/sklearn/utils/validation.py:563: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64).\n",
" FutureWarning)\n"
]
},
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=None, n_neighbors=1, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
@@ -160,7 +147,9 @@
"xtrain, xtest, ytrain, ytest = train_test_split(setjen, target, random_state=0)\n",
"\n",
"knn = KNeighborsClassifier(n_neighbors=1)\n",
"knn.fit(xtrain, ytrain)\n"
"f = knn.fit(xtrain, ytrain)\n",
"print(type(f))\n",
"\n"
]
},
{

View File

@@ -0,0 +1,259 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Labo 7 Data Science: Decision Trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 &amp; 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
"\n",
"**1.1 De data verkennen**\n",
"\n",
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
" * beds: het aantal bedden\n",
" * bath: het aantal baden\n",
" * price: de verkoopprijs (\\$)\n",
" * year_built: het bouwjaar\n",
" * sqft: de oppervlakte in square foot\n",
" * price_per_sqft: de prijs (\\$) per square foot\n",
" * elevation: hoogte in m\n",
"\n",
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
"\n",
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [
{
"ename": "ParserError",
"evalue": "Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mParserError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-7-d1e7aebbc63b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mhousing\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"housing-ny-sf.csv\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskip_blank_lines\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskipinitialspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 700\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 701\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 702\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 703\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 435\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 436\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1137\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1138\u001b[0m \u001b[0mnrows\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'nrows'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1139\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;31m# May alter columns / col_dict\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1993\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1994\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1995\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1996\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1997\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import tree\n",
"#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\n",
"housing = pd.read_csv(\"housing-ny-sf.csv\",skip_blank_lines=True,skipinitialspace=True)\n",
"\n",
"\n",
"\n",
"\n",
"train_test_split(housing.iloc[:,1:], housing['in_sf'], test_size=0.3, random_states=0)\n",
"\n",
"#housing.groupby('in_sf').\n",
"housing.groupby('in_sf').price.hist(alpha=0.4)\n",
"\n",
"for i in range(1,10):\n",
" clf - tree.DecisionTreeClassifier(random_states=0, max_depth=i)\n",
" clf = clf.fit(x_train, y_train)\n",
" print(clf.score(x_train, y_train))\n",
" print(clf.score(x_test, y_test))\n",
" print(\"\\n/n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"clf = tree.DecisionTreeClassifier(random_state=0,max_depth=3)\n",
"clf = clf.fit(x_train, y_train)\n",
"\n",
"import graphviz\n",
"dot_data = tree.export_graphviz(clf, out_file=Nonem feature_names=housing.iloc[:,1:].columns, class_names=['New York', 'San Fransisco'] filled=True, rounded=True, special_characters=True)\n",
"graph = graphviz.Source(dot_data)\n",
"graph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1.2 Training en parameter tuning**\n",
"\n",
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state &ne; 0 bv. 88\n",
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
"\n",
"**2.1 Generatie van de sample data**\n",
"\n",
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
"\n",
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
" \n",
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
"\n",
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
"\n",
"* Deel de data in in een training- en testset (70%/30%)\n",
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
"\n",
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
"\n",
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
"\n",
"\n",
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
"\n",
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.3 Random forests en gradient boosting machines**\n",
"\n",
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

259
7/Labo7.ipynb Normal file
View File

@@ -0,0 +1,259 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Labo 7 Data Science: Decision Trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 &amp; 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
"\n",
"**1.1 De data verkennen**\n",
"\n",
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
" * beds: het aantal bedden\n",
" * bath: het aantal baden\n",
" * price: de verkoopprijs (\\$)\n",
" * year_built: het bouwjaar\n",
" * sqft: de oppervlakte in square foot\n",
" * price_per_sqft: de prijs (\\$) per square foot\n",
" * elevation: hoogte in m\n",
"\n",
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
"\n",
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [
{
"ename": "ParserError",
"evalue": "Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mParserError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-7-d1e7aebbc63b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mhousing\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"housing-ny-sf.csv\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskip_blank_lines\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mskipinitialspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 700\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 701\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 702\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 703\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 435\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 436\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1137\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1138\u001b[0m \u001b[0mnrows\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'nrows'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1139\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1140\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1141\u001b[0m \u001b[0;31m# May alter columns / col_dict\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/.local/lib/python3.7/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1993\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1994\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1995\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1996\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1997\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import tree\n",
"#dees moe gewoon gefixed worde, geen id hoe, ma fuck it dude\n",
"housing = pd.read_csv(\"housing-ny-sf.csv\",skip_blank_lines=True,skipinitialspace=True)\n",
"\n",
"\n",
"\n",
"\n",
"train_test_split(housing.iloc[:,1:], housing['in_sf'], test_size=0.3, random_states=0)\n",
"\n",
"#housing.groupby('in_sf').\n",
"housing.groupby('in_sf').price.hist(alpha=0.4)\n",
"\n",
"for i in range(1,10):\n",
" clf - tree.DecisionTreeClassifier(random_states=0, max_depth=i)\n",
" clf = clf.fit(x_train, y_train)\n",
" print(clf.score(x_train, y_train))\n",
" print(clf.score(x_test, y_test))\n",
" print(\"\\n/n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"clf = tree.DecisionTreeClassifier(random_state=0,max_depth=3)\n",
"clf = clf.fit(x_train, y_train)\n",
"\n",
"import graphviz\n",
"dot_data = tree.export_graphviz(clf, out_file=Nonem feature_names=housing.iloc[:,1:].columns, class_names=['New York', 'San Fransisco'] filled=True, rounded=True, special_characters=True)\n",
"graph = graphviz.Source(dot_data)\n",
"graph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1.2 Training en parameter tuning**\n",
"\n",
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state &ne; 0 bv. 88\n",
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
"\n",
"**2.1 Generatie van de sample data**\n",
"\n",
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
"\n",
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
" \n",
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
"\n",
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
"\n",
"* Deel de data in in een training- en testset (70%/30%)\n",
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
"\n",
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
"\n",
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
"\n",
"\n",
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
"\n",
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.3 Random forests en gradient boosting machines**\n",
"\n",
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

495
7/housing-ny-sf.csv Normal file
View File

@@ -0,0 +1,495 @@
# This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
#
in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
0,2,1,999000,1960,1000,999,10
0,2,2,2750000,2006,1418,1939,0
0,2,2,1350000,1900,2150,628,9
0,1,1,629000,1903,500,1258,9
0,0,1,439000,1930,500,878,10
0,0,1,439000,1930,500,878,10
0,1,1,475000,1920,500,950,10
0,1,1,975000,1930,900,1083,10
0,1,1,975000,1930,900,1083,12
0,2,1,1895000,1921,1000,1895,12
0,3,3,2095000,1926,2200,952,4
0,1,1,999000,1982,784,1274,5
0,1,1,999000,1982,784,1274,5
0,1,1,1249000,1987,826,1512,3
0,0,1,1110000,2008,698,1590,5
0,2,2,2059500,2008,1373,1500,5
0,2,2,2000000,1928,1200,1667,10
0,1,1,715000,1903,557,1284,3
0,2,2,2498000,2005,1260,1983,2
0,2,1.5,2650000,1915,2500,1060,5
0,2,2,3450000,1900,1850,1865,9
0,1,1,3105000,2016,1108,2802,10
0,4,5,13750000,2016,3699,3717,10
0,2,2,1185000,1900,1000,1185,4
0,1,2,1699000,1900,1500,1133,5
0,1,1,1195000,1900,1093,1093,6
0,1,1,450000,1920,500,900,10
0,1,1,1195000,1900,1093,1093,10
0,2,2,1185000,1900,1000,1185,10
0,0,1,625000,1964,800,781,8
0,2,2,3159990,2012,1420,2225,8
0,2,2,2200000,1920,2050,1073,10
0,2,2,4895000,2016,1520,3220,10
0,2,2,6525000,2016,2018,3233,10
0,2,2,4895000,2016,1520,3220,11
0,2,2,6525000,2016,2018,3233,11
0,2,2,1675000,2006,1225,1367,12
0,2,2,1675000,2006,1225,1367,12
0,2,1,999000,1985,799,1250,5
0,1,1,1550000,1926,1000,1550,6
0,1,1,1595000,2015,714,2234,7
0,2,2,3995000,1906,2400,1665,9
0,1,1,1285000,2012,749,1716,10
0,1,1,1595000,2015,714,2234,10
0,2,2,3995000,1906,2400,1665,10
0,2,2,4995000,1900,2024,2468,8
0,3,2,3580000,1880,3000,1193,10
0,3,3,6350000,2015,2500,2540,3
0,3,3,6550000,2015,2500,2620,3
0,3,3,5985000,1909,3300,1814,3
0,3,3,9900000,2015,2950,3356,4
0,1,1,1849000,1920,1400,1321,5
0,2,2,3500000,1915,2000,1750,5
0,2,2,3500000,1910,1887,1855,5
0,1,1,1250000,2007,720,1736,6
0,2,1,899000,1990,820,1096,7
0,1,2,1950000,1900,1300,1500,7
0,1,1,1750000,1963,1000,1750,5
0,0,1,775000,2009,546,1419,6
0,0,1,390000,1955,550,709,6
0,0,1,699999,1988,500,1400,9
0,1,1,649000,1965,750,865,10
0,2,2,1200000,1964,1200,1000,10
0,0,1,319000,1941,500,638,10
0,0,1,699999,1988,500,1400,10
0,0,1,775000,2009,546,1419,10
0,1,1,649000,1965,750,865,10
0,2,2,1849000,2009,1135,1629,10
0,1,1,615000,1960,750,820,11
0,1,1,1590000,2008,785,2025,12
0,1,1,1150000,2014,645,1783,18
0,2,2,2650000,2014,1240,2137,18
0,0,1,469000,1932,500,938,18
0,0,1,569000,1962,543,1048,8
0,1,1,549000,1924,750,732,10
0,2,2,799000,1928,850,940,19
0,2,2,879000,1928,800,1099,22
0,2,3,2999000,1925,3200,937,10
0,2,3,2999000,1925,3200,937,10
0,2,3,2999000,1925,3200,937,12
0,2,3,2999000,1925,3200,937,12
0,1,1,399000,1910,475,840,10
0,1,1,1135000,2005,715,1587,10
0,1,1,1145000,2005,700,1636,10
0,1,1,1760000,1969,950,1853,10
0,2,2,1725000,2005,976,1767,10
0,2,2,2750000,2007,1384,1987,10
0,2,2,2750000,2007,1384,1987,13
0,1,1,399000,1910,475,840,14
0,2,2,1725000,2005,1144,1508,16
0,1,1,869000,1988,670,1297,16
0,1,1,1760000,1969,950,1853,17
0,3,3,1249000,1962,1500,833,18
0,3,3,1500000,1962,1600,938,18
0,1,1,1135000,2005,715,1587,19
0,1,1,1145000,2005,700,1636,19
0,2,2,1725000,2005,976,1767,19
0,2,2,3500000,1925,1550,2258,20
0,0,1,525000,1940,525,1000,21
0,2,2,2025000,1940,1433,1413,21
0,2,2,3500000,1986,1463,2392,23
0,0,1,925000,1978,585,1581,24
0,2,2,1700000,1982,1007,1688,25
0,2,2,1700000,1982,1007,1688,25
0,0,1,449000,1962,550,816,10
0,0,1,299000,1930,400,748,12
0,1,1,695000,1961,720,965,16
0,1,1,695000,1961,720,965,16
0,4,5,17750000,2012,4476,3966,20
0,5,5,27500000,1930,7500,3667,21
0,0,1,539000,1957,485,1111,10
0,0,1,779000,1975,512,1521,10
0,1,1,925000,1985,745,1242,10
0,1,1,1319000,1958,1000,1319,10
0,2,2,4240000,2016,1741,2435,10
0,2,2,4285000,2016,1747,2453,10
0,3,3,4875000,2016,2017,2417,10
0,3,3,14950000,1931,4435,3371,10
0,3,3,10225000,2016,3007,3400,10
0,4,4,13400000,2016,3331,4023,10
0,5,5,19000000,2016,4972,3821,10
0,0,1,539000,1957,485,1111,11
0,1,1,835000,1963,700,1193,12
0,10,10,7995000,1910,6400,1249,12
0,0,1,349000,1960,400,873,13
0,1,1,588000,1970,600,980,14
0,2,2,8690000,2002,2178,3990,15
0,2,2,4195000,2016,1750,2397,15
0,2,2,4195000,2016,1742,2408,15
0,2,2,4240000,2016,1741,2435,15
0,2,2,4285000,2016,1747,2453,15
0,3,3,4875000,2016,2017,2417,15
0,3,3,5750000,2016,2196,2618,15
0,3,3,9350000,2016,3054,3062,15
0,3,3,10225000,2016,3007,3400,15
0,4,4,13150000,2016,3338,3939,15
0,4,4,13400000,2016,3331,4023,15
0,5,5,19000000,2016,4972,3821,15
0,0,1,850000,1924,546,1557,10
0,7,7,19500000,1994,4238,4601,10
0,1,1,1000000,1912,612,1634,18
0,0,1,485000,1902,310,1565,23
0,3,2,7350000,1999,2075,3542,23
0,1,1,1200000,1969,600,2000,23
0,2,2,4250000,2011,1504,2826,23
0,2,2,3600000,1960,1340,2687,24
0,2,2,3600000,1960,1340,2687,24
0,1,1,525000,1924,700,750,24
0,0,1,545000,1900,387,1408,10
0,1,1,535000,1900,450,1189,10
0,4,4,5500000,1923,3200,1719,21
0,4,5,12000000,1939,3700,3243,22
0,1,2,1850000,2007,839,2205,23
0,1,1,535000,1900,450,1189,27
0,2,2,1350000,1931,1300,1038,36
0,1,1,779000,1902,700,1113,10
0,2,1,1475000,1973,971,1519,10
0,2,2,1385000,1971,962,1440,10
0,1,1,779000,1902,700,1113,13
0,1,1,649000,1929,800,811,25
0,0,1,725000,1960,600,1208,26
0,2,1,925000,1941,790,1171,27
0,1,1,965000,1961,787,1226,27
0,2,1,1250000,1922,1145,1092,30
0,1,1,650000,1907,720,903,32
0,2,1,385000,1999,820,470,8
0,2,2,1695000,2012,1051,1613,16
0,1,1,600000,1899,1060,566,8
0,2,1,910000,1899,1060,858,8
0,8,7,2300000,1910,4180,550,9
0,1,1,600000,1899,1060,566,10
0,2,1,910000,1899,1060,858,10
0,2,2,1599000,1973,1400,1142,10
0,2,2,4625000,1987,1695,2729,10
0,0,1,325000,1910,375,867,11
0,2,2,1599000,1973,1400,1142,14
0,2,2,1965000,2005,1330,1477,16
0,1,1,735000,1928,800,919,22
0,2,2,4625000,1987,1695,2729,29
0,1,1,749000,2011,762,983,10
0,1,1,499999,2011,669,747,35
0,1,1,749000,2011,762,983,35
0,2,2,904000,1920,1503,601,10
0,2,2,904000,1920,1503,601,10
0,2,1,559900,1925,1200,467,51
0,2,1,545000,1939,1049,520,39
0,2,1,365000,1925,700,521,73
0,2,1,365000,1925,700,521,73
0,1,1,935000,1910,1102,848,8
0,0,1,820000,2013,533,1538,8
0,0,1,835000,2013,501,1667,8
0,1,1,935000,1910,1102,848,10
0,1,1,1420000,2004,768,1849,10
0,1,1,1550000,2004,794,1952,10
0,2,2,1635000,2004,957,1708,10
0,1,1,1550000,2004,794,1952,11
0,1,1,1780000,2007,988,1802,14
0,2,2,2800000,2007,1308,2141,14
0,1,1,411500,1921,586,702,6
0,2,2,2175000,1999,1569,1386,10
0,2,2,1995000,1996,1044,1911,10
0,2,2,2235000,1999,1548,1444,14
0,4,4,8800000,1941,3382,2602,15
0,2,2,1850000,1996,1044,1772,17
0,2,2,1995000,1996,1044,1911,17
0,1,1,1695000,1927,680,2493,17
0,1,1,1495000,1962,1125,1329,19
0,2,2,2200000,2000,1044,2107,2
0,0.5,1,384900,1962,540,713,10
0,1,1,515000,1962,725,710,10
0,3,2,1950000,1956,1600,1219,10
0,0,1,307000,1910,330,930,15
0,2,2,1735000,1980,1585,1095,18
0,3,3,1850000,2005,1353,1367,8
0,3,3,1850000,2005,1353,1367,10
0,2,2,1995000,1986,1406,1419,14
0,2,2,2900000,1987,1600,1813,24
0,2,2,2499000,2004,1658,1507,24
0,2,2,1575000,1930,1324,1190,33
0,3,3,1495000,1990,1360,1099,0
0,1,1,529000,1986,650,814,0
0,3,3,2695000,2006,1991,1354,1
0,3,3,2695000,2006,1991,1354,1
0,4,3,3895000,2001,2277,1711,0
1,1,1,550000,1982,724,760,24
1,2,2,849000,1982,1030,824,24
1,3,2,1750000,1900,2950,593,26
1,1,1,799000,2008,847,943,29
1,1,1.5,899000,1997,1453,619,3
1,1,1,598000,2005,534,1120,5
1,1,2,1088000,1998,1086,1002,6
1,1,1,798000,1926,769,1038,10
1,1,1,798000,1926,769,1038,10
1,1,1,1495000,1927,2275,657,10
1,4,4,4300000,2006,3321,1295,10
1,1,1,699000,2008,756,925,12
1,2,2,334905,2000,1047,320,12
1,1,1.5,849000,1996,1127,753,13
1,1,1.5,1365000,1996,1607,849,13
1,1,1,649000,2011,674,963,14
1,3,3,1245000,1907,1503,828,23
1,1,1,649000,1983,850,764,163
1,2,2,1700000,1987,1250,1360,11
1,2,2,1980000,2009,1469,1348,0
1,2,2,3600000,2009,1652,2179,0
1,2,3,4995000,2009,2230,2240,0
1,2,2,1298000,2008,1159,1120,0
1,2,2,2149000,2008,1317,1632,0
1,3,2,1995000,2006,1362,1465,3
1,2,2,1650000,1937,1640,1006,7
1,1,1,949000,2010,824,1152,8
1,1,1,187518,2000,670,280,12
1,3,2.5,1995000,2008,2354,847,23
1,3,2.5,1995000,2008,2354,847,23
1,2,2,2799000,2008,1328,2108,35
1,2,2.5,1050000,2000,1640,640,2
1,2,2,895000,2006,1113,804,3
1,1,1,998000,2006,872,1144,3
1,2,2,1659000,2000,1165,1424,4
1,1,1,788000,2004,903,873,4
1,2,2,1395000,2001,1334,1046,4
1,2,2,1299000,2004,1453,894,5
1,2,2.5,850000,2000,1136,748,5
1,3,2.5,2850000,2013,2075,1373,5
1,2,2,950000,2000,1258,755,6
1,2,2,1600000,2002,1173,1364,7
1,3,2,1275000,2001,1502,849,13
1,1,1.5,775000,2009,835,928,14
1,3,2,1399000,1892,1809,773,41
1,2,2,879000,1912,950,925,53
1,1,1,699000,1907,932,750,59
1,1,1,985000,1978,884,1114,11
1,1,1,725000,1978,1063,682,60
1,2,2,849000,1911,1100,772,66
1,1,1,618000,1973,705,877,72
1,2,2,5950000,1989,3700,1608,83
1,4,4,1800000,1948,2475,727,10
1,2,2.5,2350000,2008,1314,1788,19
1,1,1,740200,1920,989,748,41
1,1,1,699000,1982,780,896,51
1,2,2,958000,2005,915,1047,54
1,2,2,1200000,1909,1302,922,56
1,3,3,1575000,1993,2233,705,62
1,2,2,1199000,1964,1100,1090,68
1,1,1,529000,1966,791,669,69
1,1,1,699000,1984,880,794,73
1,3,2,1295000,1926,1675,773,77
1,4,4,1800000,1948,2347,767,87
1,1,1,795000,2000,990,803,7
1,2,2,779000,2007,808,964,7
1,1,1.5,378551,2000,1000,379,9
1,4,2.3,1398000,1904,2492,561,13
1,2,1.3,1097000,1904,1493,735,14
1,1,1,670000,2014,507,1321,20
1,2,2,1199000,2014,943,1271,20
1,2,2,1095000,2010,1135,965,30
1,1,2,795000,2014,616,1291,31
1,3,2,979000,1900,1440,680,33
1,3,3,1488888,1977,2100,709,38
1,3,2,1389000,1908,1546,898,55
1,3,2,1389000,1908,1546,898,55
1,4,2,1099000,1962,1267,867,69
1,3,3.5,1799000,1900,2449,735,75
1,3,2,1250000,1924,1450,862,83
1,3,2,998000,1907,1464,682,84
1,3,2.5,1300000,1975,1642,792,4
1,1,1,539000,2000,709,760,5
1,1,1,735000,1983,779,944,12
1,3,2,688000,1942,1441,477,34
1,2,1,859000,1890,1100,781,44
1,3,2,699000,1964,1250,559,48
1,2,1,750000,1924,980,765,48
1,3,2,549000,1915,1972,278,51
1,6,3,1600000,1926,2567,623,52
1,2,2,875000,1909,1700,515,54
1,4,3,900000,2003,1870,481,54
1,3,2,748000,1930,1522,491,54
1,2,1.5,749000,1949,1348,556,56
1,2,1,599000,1942,707,847,61
1,3,1,648800,1940,1325,490,66
1,2,2,649000,1924,1320,492,72
1,2,1,798888,1929,1025,779,74
1,5,2,899000,1972,1940,463,81
1,3,1.5,699000,1910,1625,430,94
1,3,2,749000,1907,1513,495,95
1,3,2,535000,1967,1824,293,102
1,2,1,699000,1913,900,777,108
1,2,1,699000,1913,900,777,108
1,2,1,500000,1958,1166,429,136
1,3,2.5,879000,1981,2111,416,139
1,3,2.5,879000,1981,2111,416,139
1,4,2.5,699900,1982,1752,399,140
1,2,1,688000,1950,1145,601,143
1,3,2,1195000,1907,1396,856,33
1,3,2,1195000,1907,1396,856,33
1,1,1,697000,1922,694,1004,50
1,5,3.5,3995000,1905,3350,1193,59
1,1,1,650000,1955,600,1083,74
1,1,1,650000,1955,600,1083,74
1,1,1,775000,1955,600,1292,74
1,1,1,825000,1955,600,1375,74
1,2,1,699000,1907,915,764,89
1,3,2,1495000,1927,1520,984,105
1,4,4.5,5200000,1952,4813,1080,238
1,2,2,989000,1984,988,1001,46
1,3,1,1095000,1895,1465,747,68
1,1,1,725000,1900,811,894,70
1,1,1,725000,1900,811,894,70
1,4,4,2650000,1900,3816,694,75
1,2,2,1680000,1962,1850,908,89
1,1,1,1199000,1906,1139,1053,91
1,3,2.5,799000,1947,1400,571,35
1,3,2,795000,1954,1350,589,36
1,2,1,699000,1944,995,703,41
1,3,2,848000,1940,1500,565,42
1,3,2,899000,1948,1665,540,52
1,3,2,899000,1948,1665,540,52
1,6,4,1198000,1937,1965,610,79
1,3,3,1100000,1925,2633,418,91
1,6,3.5,949000,1918,2473,384,97
1,5,5,2990000,2014,5000,598,108
1,4,3,1338800,1940,2330,575,119
1,3,2.5,1388000,1928,1905,729,160
1,3,2,989000,1940,1603,617,227
1,3,1,1295000,1890,1772,731,65
1,6,6,6895000,1902,7800,884,67
1,2,2,1725000,1922,1415,1219,86
1,3,3,1995000,1922,1915,1042,88
1,0,1,499000,1900,510,978,91
1,0,1,499000,1900,510,978,91
1,1,1,599000,1900,624,960,91
1,1,1,599000,1900,624,960,91
1,3,2,1200000,1929,1284,935,121
1,3,2,1395000,1909,1877,743,57
1,3,3,1785000,1925,1970,906,58
1,3,2,1099000,1905,1457,754,67
1,5,5.5,3495000,1921,4310,811,73
1,3,2,2395000,1929,2323,1031,73
1,4,2.5,2549000,1907,2746,928,75
1,3,2,995000,2000,1393,714,75
1,5,3.5,6495000,1906,4609,1409,89
1,4,3,1698000,1890,1789,949,102
1,5,2,6298000,1914,3585,1757,26
1,3,3,1139000,2008,1532,743,44
1,3,2,1080000,1914,1954,553,49
1,4,1,1050000,1932,1767,594,55
1,1,1,739900,1941,875,846,70
1,4,4,1595000,1925,2750,580,81
1,2,1,699000,1907,1200,583,14
1,2,1,799000,1938,1150,695,36
1,4,2.5,995000,1915,2180,456,62
1,2,1,829000,1925,1145,724,71
1,2,1,875000,1908,1158,756,84
1,5,3,950000,1939,1846,515,97
1,2,1,749000,1936,1450,517,110
1,2,1,749000,1936,1450,517,110
1,6,6.5,9500000,1937,5420,1753,3
1,4,3,3595000,1931,3017,1192,9
1,2,1.5,1425000,1925,1360,1048,14
1,1,1,865000,1993,960,901,17
1,2,2.5,2495000,1940,1809,1379,48
1,2,2,2000000,1925,1518,1318,50
1,4,3.5,9895000,2008,6024,1643,62
1,3,2,358000,1989,1325,270,5
1,3,2.5,899000,2015,1391,646,5
1,3,2.5,929000,2015,1391,668,5
1,4,4,850000,1928,2470,344,8
1,2,1,648000,1921,1125,576,11
1,2,1,480000,1915,680,706,13
1,1,1,499000,1900,1076,464,19
1,4,2,689000,1951,1473,468,21
1,2,2,669000,1986,1317,508,23
1,2,1.5,729000,1942,1012,720,27
1,2,2,767000,1916,1380,556,31
1,3,1,660000,1900,1520,434,35
1,2,1,995000,1908,600,1658,36
1,2,1,759900,1941,1175,647,43
1,6,3.5,995000,2001,3080,323,55
1,2,1,725000,1945,1040,697,60
1,4,3,3420000,1926,5113,669,98
1,3,2,1650000,1922,2025,815,106
1,3,3.5,2250000,1928,3258,691,127
1,3,1,1319000,1925,1752,753,141
1,5,3.5,1698000,1966,2769,613,176
1,3,2,1049000,1947,1626,645,179
1,2,2,599000,1990,862,695,181
1,5,3.5,2995000,1947,3890,770,181
1,3,2,995000,1956,1305,762,216
1,1,1,350000,1908,600,583,43
1,2,1,550000,1908,800,688,43
1,4,4,3760000,1900,3085,1219,49
1,3,2,1050000,1922,1266,829,52
1,2,2,1895000,1907,1756,1079,54
1,1,1,599000,1961,680,881,56
1,4,3,1895000,2001,2041,928,61
1,3,2,1799000,1926,1800,999,66
1,2,1,600000,1908,1350,444,92
1,2,1,1495000,1908,1700,879,98
1,3,2,1595000,1961,1515,1053,103
1,3,2,849000,1947,1622,523,106
1,4,3.5,1995000,1992,3312,602,108
1,3,2,1495000,1937,1635,914,112
1,3,3.5,2195000,1922,2168,1012,125
1,4,2,1798000,1951,2050,877,131
1,2,2,849000,1978,1555,546,139
1,3,2,1159000,1977,1731,670,143
1,3,2.5,995000,1976,1959,508,163
1,4,3,1388000,1968,2275,610,163
1,5,3.5,2250000,1962,3729,603,174
1,3,2,1080000,1989,1524,709,185
1,3,2.5,1095000,1968,1868,586,187
1,2,1,599000,1972,990,605,189
1,2,1,915000,1954,1251,731,24
1,2,1,915000,1954,1251,731,24
1,3,2,725000,1975,1474,492,34
1,3,2.5,1588000,2015,2001,794,43
1,2,1,795000,1941,1256,633,63
1,2,1,795000,1941,1256,633,63
1,4,2,848000,1949,1646,515,69
1,1,1,439000,2002,667,658,80
1,3,2,849900,1958,1310,649,118
1,2,1,599000,1941,1254,478,123
1,3,2.5,1539514,2014,2024,761,136
1,3,2.5,1339000,2015,2133,628,143
1,3,2.5,1294000,2015,2133,607,143
1,3,2.5,1611000,2015,2001,805,153
1,2,2,1495000,1913,1174,1273,35
1,1,1,699000,1908,750,932,36
1,2,2.5,3495000,1900,1968,1776,76
1,4,2,699000,1949,1550,451,11
1,2,1,699000,1949,1050,666,64
1,3,3,888000,1975,1555,571,79
1,1,1,599000,1945,631,949,84
1,3,3,758000,1989,2157,351,90
1,2,2,1698000,2008,1620,1048,1
1,2,2,1698000,2008,1620,1048,1
1,1,1,849000,2012,886,958,2
1,2,2,1675000,2012,1562,1072,2
1,2,2,1695000,2007,1610,1053,2
1,3,2,2219000,2012,1921,1155,13
1,1,1,788000,2004,903,873,4
1,2,2,1950000,1995,1930,1010,4
1,0,1,539000,2000,709,760,5
1,2,2,849000,1982,1030,824,24
1,2,2.5,2495000,1940,1809,1379,48
1,4,4,3760000,1894,3085,1219,49
1,3,2,1799000,1926,1800,999,66
1,5,2.5,1800000,1890,3073,586,76
1,2,1,695000,1923,1045,665,106
1,3,2,1650000,1922,1483,1113,106
1,1,1,649000,1983,850,764,163
1,3,2,995000,1956,1305,762,216
1 # This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
2 #
3 in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
4 0,2,1,999000,1960,1000,999,10
5 0,2,2,2750000,2006,1418,1939,0
6 0,2,2,1350000,1900,2150,628,9
7 0,1,1,629000,1903,500,1258,9
8 0,0,1,439000,1930,500,878,10
9 0,0,1,439000,1930,500,878,10
10 0,1,1,475000,1920,500,950,10
11 0,1,1,975000,1930,900,1083,10
12 0,1,1,975000,1930,900,1083,12
13 0,2,1,1895000,1921,1000,1895,12
14 0,3,3,2095000,1926,2200,952,4
15 0,1,1,999000,1982,784,1274,5
16 0,1,1,999000,1982,784,1274,5
17 0,1,1,1249000,1987,826,1512,3
18 0,0,1,1110000,2008,698,1590,5
19 0,2,2,2059500,2008,1373,1500,5
20 0,2,2,2000000,1928,1200,1667,10
21 0,1,1,715000,1903,557,1284,3
22 0,2,2,2498000,2005,1260,1983,2
23 0,2,1.5,2650000,1915,2500,1060,5
24 0,2,2,3450000,1900,1850,1865,9
25 0,1,1,3105000,2016,1108,2802,10
26 0,4,5,13750000,2016,3699,3717,10
27 0,2,2,1185000,1900,1000,1185,4
28 0,1,2,1699000,1900,1500,1133,5
29 0,1,1,1195000,1900,1093,1093,6
30 0,1,1,450000,1920,500,900,10
31 0,1,1,1195000,1900,1093,1093,10
32 0,2,2,1185000,1900,1000,1185,10
33 0,0,1,625000,1964,800,781,8
34 0,2,2,3159990,2012,1420,2225,8
35 0,2,2,2200000,1920,2050,1073,10
36 0,2,2,4895000,2016,1520,3220,10
37 0,2,2,6525000,2016,2018,3233,10
38 0,2,2,4895000,2016,1520,3220,11
39 0,2,2,6525000,2016,2018,3233,11
40 0,2,2,1675000,2006,1225,1367,12
41 0,2,2,1675000,2006,1225,1367,12
42 0,2,1,999000,1985,799,1250,5
43 0,1,1,1550000,1926,1000,1550,6
44 0,1,1,1595000,2015,714,2234,7
45 0,2,2,3995000,1906,2400,1665,9
46 0,1,1,1285000,2012,749,1716,10
47 0,1,1,1595000,2015,714,2234,10
48 0,2,2,3995000,1906,2400,1665,10
49 0,2,2,4995000,1900,2024,2468,8
50 0,3,2,3580000,1880,3000,1193,10
51 0,3,3,6350000,2015,2500,2540,3
52 0,3,3,6550000,2015,2500,2620,3
53 0,3,3,5985000,1909,3300,1814,3
54 0,3,3,9900000,2015,2950,3356,4
55 0,1,1,1849000,1920,1400,1321,5
56 0,2,2,3500000,1915,2000,1750,5
57 0,2,2,3500000,1910,1887,1855,5
58 0,1,1,1250000,2007,720,1736,6
59 0,2,1,899000,1990,820,1096,7
60 0,1,2,1950000,1900,1300,1500,7
61 0,1,1,1750000,1963,1000,1750,5
62 0,0,1,775000,2009,546,1419,6
63 0,0,1,390000,1955,550,709,6
64 0,0,1,699999,1988,500,1400,9
65 0,1,1,649000,1965,750,865,10
66 0,2,2,1200000,1964,1200,1000,10
67 0,0,1,319000,1941,500,638,10
68 0,0,1,699999,1988,500,1400,10
69 0,0,1,775000,2009,546,1419,10
70 0,1,1,649000,1965,750,865,10
71 0,2,2,1849000,2009,1135,1629,10
72 0,1,1,615000,1960,750,820,11
73 0,1,1,1590000,2008,785,2025,12
74 0,1,1,1150000,2014,645,1783,18
75 0,2,2,2650000,2014,1240,2137,18
76 0,0,1,469000,1932,500,938,18
77 0,0,1,569000,1962,543,1048,8
78 0,1,1,549000,1924,750,732,10
79 0,2,2,799000,1928,850,940,19
80 0,2,2,879000,1928,800,1099,22
81 0,2,3,2999000,1925,3200,937,10
82 0,2,3,2999000,1925,3200,937,10
83 0,2,3,2999000,1925,3200,937,12
84 0,2,3,2999000,1925,3200,937,12
85 0,1,1,399000,1910,475,840,10
86 0,1,1,1135000,2005,715,1587,10
87 0,1,1,1145000,2005,700,1636,10
88 0,1,1,1760000,1969,950,1853,10
89 0,2,2,1725000,2005,976,1767,10
90 0,2,2,2750000,2007,1384,1987,10
91 0,2,2,2750000,2007,1384,1987,13
92 0,1,1,399000,1910,475,840,14
93 0,2,2,1725000,2005,1144,1508,16
94 0,1,1,869000,1988,670,1297,16
95 0,1,1,1760000,1969,950,1853,17
96 0,3,3,1249000,1962,1500,833,18
97 0,3,3,1500000,1962,1600,938,18
98 0,1,1,1135000,2005,715,1587,19
99 0,1,1,1145000,2005,700,1636,19
100 0,2,2,1725000,2005,976,1767,19
101 0,2,2,3500000,1925,1550,2258,20
102 0,0,1,525000,1940,525,1000,21
103 0,2,2,2025000,1940,1433,1413,21
104 0,2,2,3500000,1986,1463,2392,23
105 0,0,1,925000,1978,585,1581,24
106 0,2,2,1700000,1982,1007,1688,25
107 0,2,2,1700000,1982,1007,1688,25
108 0,0,1,449000,1962,550,816,10
109 0,0,1,299000,1930,400,748,12
110 0,1,1,695000,1961,720,965,16
111 0,1,1,695000,1961,720,965,16
112 0,4,5,17750000,2012,4476,3966,20
113 0,5,5,27500000,1930,7500,3667,21
114 0,0,1,539000,1957,485,1111,10
115 0,0,1,779000,1975,512,1521,10
116 0,1,1,925000,1985,745,1242,10
117 0,1,1,1319000,1958,1000,1319,10
118 0,2,2,4240000,2016,1741,2435,10
119 0,2,2,4285000,2016,1747,2453,10
120 0,3,3,4875000,2016,2017,2417,10
121 0,3,3,14950000,1931,4435,3371,10
122 0,3,3,10225000,2016,3007,3400,10
123 0,4,4,13400000,2016,3331,4023,10
124 0,5,5,19000000,2016,4972,3821,10
125 0,0,1,539000,1957,485,1111,11
126 0,1,1,835000,1963,700,1193,12
127 0,10,10,7995000,1910,6400,1249,12
128 0,0,1,349000,1960,400,873,13
129 0,1,1,588000,1970,600,980,14
130 0,2,2,8690000,2002,2178,3990,15
131 0,2,2,4195000,2016,1750,2397,15
132 0,2,2,4195000,2016,1742,2408,15
133 0,2,2,4240000,2016,1741,2435,15
134 0,2,2,4285000,2016,1747,2453,15
135 0,3,3,4875000,2016,2017,2417,15
136 0,3,3,5750000,2016,2196,2618,15
137 0,3,3,9350000,2016,3054,3062,15
138 0,3,3,10225000,2016,3007,3400,15
139 0,4,4,13150000,2016,3338,3939,15
140 0,4,4,13400000,2016,3331,4023,15
141 0,5,5,19000000,2016,4972,3821,15
142 0,0,1,850000,1924,546,1557,10
143 0,7,7,19500000,1994,4238,4601,10
144 0,1,1,1000000,1912,612,1634,18
145 0,0,1,485000,1902,310,1565,23
146 0,3,2,7350000,1999,2075,3542,23
147 0,1,1,1200000,1969,600,2000,23
148 0,2,2,4250000,2011,1504,2826,23
149 0,2,2,3600000,1960,1340,2687,24
150 0,2,2,3600000,1960,1340,2687,24
151 0,1,1,525000,1924,700,750,24
152 0,0,1,545000,1900,387,1408,10
153 0,1,1,535000,1900,450,1189,10
154 0,4,4,5500000,1923,3200,1719,21
155 0,4,5,12000000,1939,3700,3243,22
156 0,1,2,1850000,2007,839,2205,23
157 0,1,1,535000,1900,450,1189,27
158 0,2,2,1350000,1931,1300,1038,36
159 0,1,1,779000,1902,700,1113,10
160 0,2,1,1475000,1973,971,1519,10
161 0,2,2,1385000,1971,962,1440,10
162 0,1,1,779000,1902,700,1113,13
163 0,1,1,649000,1929,800,811,25
164 0,0,1,725000,1960,600,1208,26
165 0,2,1,925000,1941,790,1171,27
166 0,1,1,965000,1961,787,1226,27
167 0,2,1,1250000,1922,1145,1092,30
168 0,1,1,650000,1907,720,903,32
169 0,2,1,385000,1999,820,470,8
170 0,2,2,1695000,2012,1051,1613,16
171 0,1,1,600000,1899,1060,566,8
172 0,2,1,910000,1899,1060,858,8
173 0,8,7,2300000,1910,4180,550,9
174 0,1,1,600000,1899,1060,566,10
175 0,2,1,910000,1899,1060,858,10
176 0,2,2,1599000,1973,1400,1142,10
177 0,2,2,4625000,1987,1695,2729,10
178 0,0,1,325000,1910,375,867,11
179 0,2,2,1599000,1973,1400,1142,14
180 0,2,2,1965000,2005,1330,1477,16
181 0,1,1,735000,1928,800,919,22
182 0,2,2,4625000,1987,1695,2729,29
183 0,1,1,749000,2011,762,983,10
184 0,1,1,499999,2011,669,747,35
185 0,1,1,749000,2011,762,983,35
186 0,2,2,904000,1920,1503,601,10
187 0,2,2,904000,1920,1503,601,10
188 0,2,1,559900,1925,1200,467,51
189 0,2,1,545000,1939,1049,520,39
190 0,2,1,365000,1925,700,521,73
191 0,2,1,365000,1925,700,521,73
192 0,1,1,935000,1910,1102,848,8
193 0,0,1,820000,2013,533,1538,8
194 0,0,1,835000,2013,501,1667,8
195 0,1,1,935000,1910,1102,848,10
196 0,1,1,1420000,2004,768,1849,10
197 0,1,1,1550000,2004,794,1952,10
198 0,2,2,1635000,2004,957,1708,10
199 0,1,1,1550000,2004,794,1952,11
200 0,1,1,1780000,2007,988,1802,14
201 0,2,2,2800000,2007,1308,2141,14
202 0,1,1,411500,1921,586,702,6
203 0,2,2,2175000,1999,1569,1386,10
204 0,2,2,1995000,1996,1044,1911,10
205 0,2,2,2235000,1999,1548,1444,14
206 0,4,4,8800000,1941,3382,2602,15
207 0,2,2,1850000,1996,1044,1772,17
208 0,2,2,1995000,1996,1044,1911,17
209 0,1,1,1695000,1927,680,2493,17
210 0,1,1,1495000,1962,1125,1329,19
211 0,2,2,2200000,2000,1044,2107,2
212 0,0.5,1,384900,1962,540,713,10
213 0,1,1,515000,1962,725,710,10
214 0,3,2,1950000,1956,1600,1219,10
215 0,0,1,307000,1910,330,930,15
216 0,2,2,1735000,1980,1585,1095,18
217 0,3,3,1850000,2005,1353,1367,8
218 0,3,3,1850000,2005,1353,1367,10
219 0,2,2,1995000,1986,1406,1419,14
220 0,2,2,2900000,1987,1600,1813,24
221 0,2,2,2499000,2004,1658,1507,24
222 0,2,2,1575000,1930,1324,1190,33
223 0,3,3,1495000,1990,1360,1099,0
224 0,1,1,529000,1986,650,814,0
225 0,3,3,2695000,2006,1991,1354,1
226 0,3,3,2695000,2006,1991,1354,1
227 0,4,3,3895000,2001,2277,1711,0
228 1,1,1,550000,1982,724,760,24
229 1,2,2,849000,1982,1030,824,24
230 1,3,2,1750000,1900,2950,593,26
231 1,1,1,799000,2008,847,943,29
232 1,1,1.5,899000,1997,1453,619,3
233 1,1,1,598000,2005,534,1120,5
234 1,1,2,1088000,1998,1086,1002,6
235 1,1,1,798000,1926,769,1038,10
236 1,1,1,798000,1926,769,1038,10
237 1,1,1,1495000,1927,2275,657,10
238 1,4,4,4300000,2006,3321,1295,10
239 1,1,1,699000,2008,756,925,12
240 1,2,2,334905,2000,1047,320,12
241 1,1,1.5,849000,1996,1127,753,13
242 1,1,1.5,1365000,1996,1607,849,13
243 1,1,1,649000,2011,674,963,14
244 1,3,3,1245000,1907,1503,828,23
245 1,1,1,649000,1983,850,764,163
246 1,2,2,1700000,1987,1250,1360,11
247 1,2,2,1980000,2009,1469,1348,0
248 1,2,2,3600000,2009,1652,2179,0
249 1,2,3,4995000,2009,2230,2240,0
250 1,2,2,1298000,2008,1159,1120,0
251 1,2,2,2149000,2008,1317,1632,0
252 1,3,2,1995000,2006,1362,1465,3
253 1,2,2,1650000,1937,1640,1006,7
254 1,1,1,949000,2010,824,1152,8
255 1,1,1,187518,2000,670,280,12
256 1,3,2.5,1995000,2008,2354,847,23
257 1,3,2.5,1995000,2008,2354,847,23
258 1,2,2,2799000,2008,1328,2108,35
259 1,2,2.5,1050000,2000,1640,640,2
260 1,2,2,895000,2006,1113,804,3
261 1,1,1,998000,2006,872,1144,3
262 1,2,2,1659000,2000,1165,1424,4
263 1,1,1,788000,2004,903,873,4
264 1,2,2,1395000,2001,1334,1046,4
265 1,2,2,1299000,2004,1453,894,5
266 1,2,2.5,850000,2000,1136,748,5
267 1,3,2.5,2850000,2013,2075,1373,5
268 1,2,2,950000,2000,1258,755,6
269 1,2,2,1600000,2002,1173,1364,7
270 1,3,2,1275000,2001,1502,849,13
271 1,1,1.5,775000,2009,835,928,14
272 1,3,2,1399000,1892,1809,773,41
273 1,2,2,879000,1912,950,925,53
274 1,1,1,699000,1907,932,750,59
275 1,1,1,985000,1978,884,1114,11
276 1,1,1,725000,1978,1063,682,60
277 1,2,2,849000,1911,1100,772,66
278 1,1,1,618000,1973,705,877,72
279 1,2,2,5950000,1989,3700,1608,83
280 1,4,4,1800000,1948,2475,727,10
281 1,2,2.5,2350000,2008,1314,1788,19
282 1,1,1,740200,1920,989,748,41
283 1,1,1,699000,1982,780,896,51
284 1,2,2,958000,2005,915,1047,54
285 1,2,2,1200000,1909,1302,922,56
286 1,3,3,1575000,1993,2233,705,62
287 1,2,2,1199000,1964,1100,1090,68
288 1,1,1,529000,1966,791,669,69
289 1,1,1,699000,1984,880,794,73
290 1,3,2,1295000,1926,1675,773,77
291 1,4,4,1800000,1948,2347,767,87
292 1,1,1,795000,2000,990,803,7
293 1,2,2,779000,2007,808,964,7
294 1,1,1.5,378551,2000,1000,379,9
295 1,4,2.3,1398000,1904,2492,561,13
296 1,2,1.3,1097000,1904,1493,735,14
297 1,1,1,670000,2014,507,1321,20
298 1,2,2,1199000,2014,943,1271,20
299 1,2,2,1095000,2010,1135,965,30
300 1,1,2,795000,2014,616,1291,31
301 1,3,2,979000,1900,1440,680,33
302 1,3,3,1488888,1977,2100,709,38
303 1,3,2,1389000,1908,1546,898,55
304 1,3,2,1389000,1908,1546,898,55
305 1,4,2,1099000,1962,1267,867,69
306 1,3,3.5,1799000,1900,2449,735,75
307 1,3,2,1250000,1924,1450,862,83
308 1,3,2,998000,1907,1464,682,84
309 1,3,2.5,1300000,1975,1642,792,4
310 1,1,1,539000,2000,709,760,5
311 1,1,1,735000,1983,779,944,12
312 1,3,2,688000,1942,1441,477,34
313 1,2,1,859000,1890,1100,781,44
314 1,3,2,699000,1964,1250,559,48
315 1,2,1,750000,1924,980,765,48
316 1,3,2,549000,1915,1972,278,51
317 1,6,3,1600000,1926,2567,623,52
318 1,2,2,875000,1909,1700,515,54
319 1,4,3,900000,2003,1870,481,54
320 1,3,2,748000,1930,1522,491,54
321 1,2,1.5,749000,1949,1348,556,56
322 1,2,1,599000,1942,707,847,61
323 1,3,1,648800,1940,1325,490,66
324 1,2,2,649000,1924,1320,492,72
325 1,2,1,798888,1929,1025,779,74
326 1,5,2,899000,1972,1940,463,81
327 1,3,1.5,699000,1910,1625,430,94
328 1,3,2,749000,1907,1513,495,95
329 1,3,2,535000,1967,1824,293,102
330 1,2,1,699000,1913,900,777,108
331 1,2,1,699000,1913,900,777,108
332 1,2,1,500000,1958,1166,429,136
333 1,3,2.5,879000,1981,2111,416,139
334 1,3,2.5,879000,1981,2111,416,139
335 1,4,2.5,699900,1982,1752,399,140
336 1,2,1,688000,1950,1145,601,143
337 1,3,2,1195000,1907,1396,856,33
338 1,3,2,1195000,1907,1396,856,33
339 1,1,1,697000,1922,694,1004,50
340 1,5,3.5,3995000,1905,3350,1193,59
341 1,1,1,650000,1955,600,1083,74
342 1,1,1,650000,1955,600,1083,74
343 1,1,1,775000,1955,600,1292,74
344 1,1,1,825000,1955,600,1375,74
345 1,2,1,699000,1907,915,764,89
346 1,3,2,1495000,1927,1520,984,105
347 1,4,4.5,5200000,1952,4813,1080,238
348 1,2,2,989000,1984,988,1001,46
349 1,3,1,1095000,1895,1465,747,68
350 1,1,1,725000,1900,811,894,70
351 1,1,1,725000,1900,811,894,70
352 1,4,4,2650000,1900,3816,694,75
353 1,2,2,1680000,1962,1850,908,89
354 1,1,1,1199000,1906,1139,1053,91
355 1,3,2.5,799000,1947,1400,571,35
356 1,3,2,795000,1954,1350,589,36
357 1,2,1,699000,1944,995,703,41
358 1,3,2,848000,1940,1500,565,42
359 1,3,2,899000,1948,1665,540,52
360 1,3,2,899000,1948,1665,540,52
361 1,6,4,1198000,1937,1965,610,79
362 1,3,3,1100000,1925,2633,418,91
363 1,6,3.5,949000,1918,2473,384,97
364 1,5,5,2990000,2014,5000,598,108
365 1,4,3,1338800,1940,2330,575,119
366 1,3,2.5,1388000,1928,1905,729,160
367 1,3,2,989000,1940,1603,617,227
368 1,3,1,1295000,1890,1772,731,65
369 1,6,6,6895000,1902,7800,884,67
370 1,2,2,1725000,1922,1415,1219,86
371 1,3,3,1995000,1922,1915,1042,88
372 1,0,1,499000,1900,510,978,91
373 1,0,1,499000,1900,510,978,91
374 1,1,1,599000,1900,624,960,91
375 1,1,1,599000,1900,624,960,91
376 1,3,2,1200000,1929,1284,935,121
377 1,3,2,1395000,1909,1877,743,57
378 1,3,3,1785000,1925,1970,906,58
379 1,3,2,1099000,1905,1457,754,67
380 1,5,5.5,3495000,1921,4310,811,73
381 1,3,2,2395000,1929,2323,1031,73
382 1,4,2.5,2549000,1907,2746,928,75
383 1,3,2,995000,2000,1393,714,75
384 1,5,3.5,6495000,1906,4609,1409,89
385 1,4,3,1698000,1890,1789,949,102
386 1,5,2,6298000,1914,3585,1757,26
387 1,3,3,1139000,2008,1532,743,44
388 1,3,2,1080000,1914,1954,553,49
389 1,4,1,1050000,1932,1767,594,55
390 1,1,1,739900,1941,875,846,70
391 1,4,4,1595000,1925,2750,580,81
392 1,2,1,699000,1907,1200,583,14
393 1,2,1,799000,1938,1150,695,36
394 1,4,2.5,995000,1915,2180,456,62
395 1,2,1,829000,1925,1145,724,71
396 1,2,1,875000,1908,1158,756,84
397 1,5,3,950000,1939,1846,515,97
398 1,2,1,749000,1936,1450,517,110
399 1,2,1,749000,1936,1450,517,110
400 1,6,6.5,9500000,1937,5420,1753,3
401 1,4,3,3595000,1931,3017,1192,9
402 1,2,1.5,1425000,1925,1360,1048,14
403 1,1,1,865000,1993,960,901,17
404 1,2,2.5,2495000,1940,1809,1379,48
405 1,2,2,2000000,1925,1518,1318,50
406 1,4,3.5,9895000,2008,6024,1643,62
407 1,3,2,358000,1989,1325,270,5
408 1,3,2.5,899000,2015,1391,646,5
409 1,3,2.5,929000,2015,1391,668,5
410 1,4,4,850000,1928,2470,344,8
411 1,2,1,648000,1921,1125,576,11
412 1,2,1,480000,1915,680,706,13
413 1,1,1,499000,1900,1076,464,19
414 1,4,2,689000,1951,1473,468,21
415 1,2,2,669000,1986,1317,508,23
416 1,2,1.5,729000,1942,1012,720,27
417 1,2,2,767000,1916,1380,556,31
418 1,3,1,660000,1900,1520,434,35
419 1,2,1,995000,1908,600,1658,36
420 1,2,1,759900,1941,1175,647,43
421 1,6,3.5,995000,2001,3080,323,55
422 1,2,1,725000,1945,1040,697,60
423 1,4,3,3420000,1926,5113,669,98
424 1,3,2,1650000,1922,2025,815,106
425 1,3,3.5,2250000,1928,3258,691,127
426 1,3,1,1319000,1925,1752,753,141
427 1,5,3.5,1698000,1966,2769,613,176
428 1,3,2,1049000,1947,1626,645,179
429 1,2,2,599000,1990,862,695,181
430 1,5,3.5,2995000,1947,3890,770,181
431 1,3,2,995000,1956,1305,762,216
432 1,1,1,350000,1908,600,583,43
433 1,2,1,550000,1908,800,688,43
434 1,4,4,3760000,1900,3085,1219,49
435 1,3,2,1050000,1922,1266,829,52
436 1,2,2,1895000,1907,1756,1079,54
437 1,1,1,599000,1961,680,881,56
438 1,4,3,1895000,2001,2041,928,61
439 1,3,2,1799000,1926,1800,999,66
440 1,2,1,600000,1908,1350,444,92
441 1,2,1,1495000,1908,1700,879,98
442 1,3,2,1595000,1961,1515,1053,103
443 1,3,2,849000,1947,1622,523,106
444 1,4,3.5,1995000,1992,3312,602,108
445 1,3,2,1495000,1937,1635,914,112
446 1,3,3.5,2195000,1922,2168,1012,125
447 1,4,2,1798000,1951,2050,877,131
448 1,2,2,849000,1978,1555,546,139
449 1,3,2,1159000,1977,1731,670,143
450 1,3,2.5,995000,1976,1959,508,163
451 1,4,3,1388000,1968,2275,610,163
452 1,5,3.5,2250000,1962,3729,603,174
453 1,3,2,1080000,1989,1524,709,185
454 1,3,2.5,1095000,1968,1868,586,187
455 1,2,1,599000,1972,990,605,189
456 1,2,1,915000,1954,1251,731,24
457 1,2,1,915000,1954,1251,731,24
458 1,3,2,725000,1975,1474,492,34
459 1,3,2.5,1588000,2015,2001,794,43
460 1,2,1,795000,1941,1256,633,63
461 1,2,1,795000,1941,1256,633,63
462 1,4,2,848000,1949,1646,515,69
463 1,1,1,439000,2002,667,658,80
464 1,3,2,849900,1958,1310,649,118
465 1,2,1,599000,1941,1254,478,123
466 1,3,2.5,1539514,2014,2024,761,136
467 1,3,2.5,1339000,2015,2133,628,143
468 1,3,2.5,1294000,2015,2133,607,143
469 1,3,2.5,1611000,2015,2001,805,153
470 1,2,2,1495000,1913,1174,1273,35
471 1,1,1,699000,1908,750,932,36
472 1,2,2.5,3495000,1900,1968,1776,76
473 1,4,2,699000,1949,1550,451,11
474 1,2,1,699000,1949,1050,666,64
475 1,3,3,888000,1975,1555,571,79
476 1,1,1,599000,1945,631,949,84
477 1,3,3,758000,1989,2157,351,90
478 1,2,2,1698000,2008,1620,1048,1
479 1,2,2,1698000,2008,1620,1048,1
480 1,1,1,849000,2012,886,958,2
481 1,2,2,1675000,2012,1562,1072,2
482 1,2,2,1695000,2007,1610,1053,2
483 1,3,2,2219000,2012,1921,1155,13
484 1,1,1,788000,2004,903,873,4
485 1,2,2,1950000,1995,1930,1010,4
486 1,0,1,539000,2000,709,760,5
487 1,2,2,849000,1982,1030,824,24
488 1,2,2.5,2495000,1940,1809,1379,48
489 1,4,4,3760000,1894,3085,1219,49
490 1,3,2,1799000,1926,1800,999,66
491 1,5,2.5,1800000,1890,3073,586,76
492 1,2,1,695000,1923,1045,665,106
493 1,3,2,1650000,1922,1483,1113,106
494 1,1,1,649000,1983,850,764,163
495 1,3,2,995000,1956,1305,762,216

208
Labo7.ipynb Normal file
View File

@@ -0,0 +1,208 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Labo 7 Data Science: Decision Trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<strong>Opmerking.</strong> In dit labo worden decision trees getekend a.d.h.v. het pakket GraphViz. In labo 4 &amp; 5 werd dit pakket geïnstalleerd. Op je eigen PC installeer je de package <strong>python-graphviz</strong> in Anaconda (onder environments). Dit kan een tijdje duren. Herstart je kernel om de installatie te kunnen gebruiken.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 1** : **Decision Trees voor een eenvoudig classificatieprobleem**\n",
"\n",
"**1.1 De data verkennen**\n",
"\n",
"Gegeven de dataset van housing-ny-sf.csv. Deze dataset kan worden gebruikt om te voorspellen of een appartement in New York gelegen is of in San Fransisco. Het bestand bevat volgende kolommen:\n",
" * in_sf: het te voorspellen target: staat op 1 indien het appartement in San Francisco gelegen is\n",
" * beds: het aantal bedden\n",
" * bath: het aantal baden\n",
" * price: de verkoopprijs (\\$)\n",
" * year_built: het bouwjaar\n",
" * sqft: de oppervlakte in square foot\n",
" * price_per_sqft: de prijs (\\$) per square foot\n",
" * elevation: hoogte in m\n",
"\n",
"Een leuke visuele intro op deze oefening vind je hier: _http://www.r2d3.us/visual-intro-to-machine-learning-part-1/_\n",
"\n",
" * Laad de data in in een Pandas-dataframe (gelieve niks te veranderen aan het csv-bestand, tip: skippen).\n",
" * Maak een scatter_matrix-plot van de __features__ waarbij elke instanties steeds ingekleurd wordt volgens zijn target (met colormap 'brg' wordt San Francisco groen en New York blauw)\n",
" * Teken met Pandas (groupby en hist(alpha=0.4)) een histogram (met verschillende kleur voor SF en NY) voor een aantal features waarvan je verwacht dat de spreiding voor de 2 steden sterk verschilt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1.2 Training en parameter tuning**\n",
"\n",
" * Deel de data in in een trainingset en een test set (70%/30%) - kies een random_state &ne; 0 bv. 88\n",
" * Train deze data met DecisionTreeClassifier zonder parameters\n",
" * Schrijf een script dat de ideale diepte zoekt van de decision tree en teken deze tree"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Oefening 2** : **Classificatie-oefening met decision trees, random forests en gradient boosting machines**\n",
"\n",
"**2.1 Generatie van de sample data**\n",
"\n",
"Je start deze oefening met de creatie van sample data voor een 'ternair classificatieprobleem'. Deze data heeft 2 features nl. X en Y, en een target genaamd __color__ (mogelijke waarde: red, green, blue). We zullen deze data gebruiken om met verschillende classificatie-algoritmen te testen en de decision boundary te visualiseren.\n",
"\n",
"* Door x- en y-coördinaten te genereren met _np.random.normal_ ontstaat er een _puntenwolk_ met als centrum (0,0). Door een constante waarde bij x of y te tellen kan je het centrum van deze wolk verschuiven in de x- of y-richting. Genereer nu volgende sample-data:\n",
" * een puntenwolk van 1000 instanties met als centrum (0,0) en color-label 'red'\n",
" * een puntenwolk van 1000 instanties met als centrum (2.5,2.5) en color-label 'green'\n",
" * een puntenwolk van 1000 instanties met als centrum (5,0) en color-label 'blue'\n",
" * bij de aanroep van _np.random.normal_ gef je geen extra parameters mee, tenzij de size\n",
" \n",
"* Maak een scatter-plot van deze data. Als alles goed zit, zie je de 3 apart ingekleurde puntenwolken die lichtjes overlappen. (Je kan de color-kolom rechtstreeks doorgeven aan matplotlib.) Met volgende code kan je ervoor zorgen dat X en Y dezelfde schaal hanteren:\n",
"\n",
" <code>plt.gca().set_aspect('equal', adjustable='box')</code>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.2 Decision trees: visualisatie van de decision boundary**\n",
"\n",
"* Deel de data in in een training- en testset (70%/30%)\n",
"* Train een DecisionTreeClassifier met de trainingsdata en meet de accuracy op de training- en testdata\n",
"* We gaan nu de decision boundary benaderen door eerst de voorspelling op te vragen voor een grid van (x,y)-coördinaten die de volledige grafiek bedekt. Deze grid genereer je als volgt:\n",
"\n",
"<code>grid = np.mgrid[-4:8.6:0.05, -4:6:0.05].reshape(2,-1).T</code>\n",
"\n",
"* De voorspelde waarden kan je ook weer rechtstreeks doorgeven als kleur van de scatter-plot\n",
"\n",
"\n",
"* Pas nu je script aan zodat je voor max_depth van 1 t.e.m. 8 de accuracies print en de decision boundary plot\n",
"\n",
"Kan je de decision boundary van max_depth=1 verklaren? Kan je de instelling met de beste bias-variance tradeoff ook visueel verklaren a.d.h.v. de decsion boundary?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2.3 Random forests en gradient boosting machines**\n",
"\n",
"* Toon nu ook de accuracies en de decision boundary voor Random forests en gradient boosting machines\n",
"* Random forests: waarom heeft parameter tuning van max_features hier geen zin?\n",
"* Gradient boosting machines: experimenteer eens met de learning_rate ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

495
housing-ny-sf.csv Normal file
View File

@@ -0,0 +1,495 @@
# This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
#
in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
0,2,1,999000,1960,1000,999,10
0,2,2,2750000,2006,1418,1939,0
0,2,2,1350000,1900,2150,628,9
0,1,1,629000,1903,500,1258,9
0,0,1,439000,1930,500,878,10
0,0,1,439000,1930,500,878,10
0,1,1,475000,1920,500,950,10
0,1,1,975000,1930,900,1083,10
0,1,1,975000,1930,900,1083,12
0,2,1,1895000,1921,1000,1895,12
0,3,3,2095000,1926,2200,952,4
0,1,1,999000,1982,784,1274,5
0,1,1,999000,1982,784,1274,5
0,1,1,1249000,1987,826,1512,3
0,0,1,1110000,2008,698,1590,5
0,2,2,2059500,2008,1373,1500,5
0,2,2,2000000,1928,1200,1667,10
0,1,1,715000,1903,557,1284,3
0,2,2,2498000,2005,1260,1983,2
0,2,1.5,2650000,1915,2500,1060,5
0,2,2,3450000,1900,1850,1865,9
0,1,1,3105000,2016,1108,2802,10
0,4,5,13750000,2016,3699,3717,10
0,2,2,1185000,1900,1000,1185,4
0,1,2,1699000,1900,1500,1133,5
0,1,1,1195000,1900,1093,1093,6
0,1,1,450000,1920,500,900,10
0,1,1,1195000,1900,1093,1093,10
0,2,2,1185000,1900,1000,1185,10
0,0,1,625000,1964,800,781,8
0,2,2,3159990,2012,1420,2225,8
0,2,2,2200000,1920,2050,1073,10
0,2,2,4895000,2016,1520,3220,10
0,2,2,6525000,2016,2018,3233,10
0,2,2,4895000,2016,1520,3220,11
0,2,2,6525000,2016,2018,3233,11
0,2,2,1675000,2006,1225,1367,12
0,2,2,1675000,2006,1225,1367,12
0,2,1,999000,1985,799,1250,5
0,1,1,1550000,1926,1000,1550,6
0,1,1,1595000,2015,714,2234,7
0,2,2,3995000,1906,2400,1665,9
0,1,1,1285000,2012,749,1716,10
0,1,1,1595000,2015,714,2234,10
0,2,2,3995000,1906,2400,1665,10
0,2,2,4995000,1900,2024,2468,8
0,3,2,3580000,1880,3000,1193,10
0,3,3,6350000,2015,2500,2540,3
0,3,3,6550000,2015,2500,2620,3
0,3,3,5985000,1909,3300,1814,3
0,3,3,9900000,2015,2950,3356,4
0,1,1,1849000,1920,1400,1321,5
0,2,2,3500000,1915,2000,1750,5
0,2,2,3500000,1910,1887,1855,5
0,1,1,1250000,2007,720,1736,6
0,2,1,899000,1990,820,1096,7
0,1,2,1950000,1900,1300,1500,7
0,1,1,1750000,1963,1000,1750,5
0,0,1,775000,2009,546,1419,6
0,0,1,390000,1955,550,709,6
0,0,1,699999,1988,500,1400,9
0,1,1,649000,1965,750,865,10
0,2,2,1200000,1964,1200,1000,10
0,0,1,319000,1941,500,638,10
0,0,1,699999,1988,500,1400,10
0,0,1,775000,2009,546,1419,10
0,1,1,649000,1965,750,865,10
0,2,2,1849000,2009,1135,1629,10
0,1,1,615000,1960,750,820,11
0,1,1,1590000,2008,785,2025,12
0,1,1,1150000,2014,645,1783,18
0,2,2,2650000,2014,1240,2137,18
0,0,1,469000,1932,500,938,18
0,0,1,569000,1962,543,1048,8
0,1,1,549000,1924,750,732,10
0,2,2,799000,1928,850,940,19
0,2,2,879000,1928,800,1099,22
0,2,3,2999000,1925,3200,937,10
0,2,3,2999000,1925,3200,937,10
0,2,3,2999000,1925,3200,937,12
0,2,3,2999000,1925,3200,937,12
0,1,1,399000,1910,475,840,10
0,1,1,1135000,2005,715,1587,10
0,1,1,1145000,2005,700,1636,10
0,1,1,1760000,1969,950,1853,10
0,2,2,1725000,2005,976,1767,10
0,2,2,2750000,2007,1384,1987,10
0,2,2,2750000,2007,1384,1987,13
0,1,1,399000,1910,475,840,14
0,2,2,1725000,2005,1144,1508,16
0,1,1,869000,1988,670,1297,16
0,1,1,1760000,1969,950,1853,17
0,3,3,1249000,1962,1500,833,18
0,3,3,1500000,1962,1600,938,18
0,1,1,1135000,2005,715,1587,19
0,1,1,1145000,2005,700,1636,19
0,2,2,1725000,2005,976,1767,19
0,2,2,3500000,1925,1550,2258,20
0,0,1,525000,1940,525,1000,21
0,2,2,2025000,1940,1433,1413,21
0,2,2,3500000,1986,1463,2392,23
0,0,1,925000,1978,585,1581,24
0,2,2,1700000,1982,1007,1688,25
0,2,2,1700000,1982,1007,1688,25
0,0,1,449000,1962,550,816,10
0,0,1,299000,1930,400,748,12
0,1,1,695000,1961,720,965,16
0,1,1,695000,1961,720,965,16
0,4,5,17750000,2012,4476,3966,20
0,5,5,27500000,1930,7500,3667,21
0,0,1,539000,1957,485,1111,10
0,0,1,779000,1975,512,1521,10
0,1,1,925000,1985,745,1242,10
0,1,1,1319000,1958,1000,1319,10
0,2,2,4240000,2016,1741,2435,10
0,2,2,4285000,2016,1747,2453,10
0,3,3,4875000,2016,2017,2417,10
0,3,3,14950000,1931,4435,3371,10
0,3,3,10225000,2016,3007,3400,10
0,4,4,13400000,2016,3331,4023,10
0,5,5,19000000,2016,4972,3821,10
0,0,1,539000,1957,485,1111,11
0,1,1,835000,1963,700,1193,12
0,10,10,7995000,1910,6400,1249,12
0,0,1,349000,1960,400,873,13
0,1,1,588000,1970,600,980,14
0,2,2,8690000,2002,2178,3990,15
0,2,2,4195000,2016,1750,2397,15
0,2,2,4195000,2016,1742,2408,15
0,2,2,4240000,2016,1741,2435,15
0,2,2,4285000,2016,1747,2453,15
0,3,3,4875000,2016,2017,2417,15
0,3,3,5750000,2016,2196,2618,15
0,3,3,9350000,2016,3054,3062,15
0,3,3,10225000,2016,3007,3400,15
0,4,4,13150000,2016,3338,3939,15
0,4,4,13400000,2016,3331,4023,15
0,5,5,19000000,2016,4972,3821,15
0,0,1,850000,1924,546,1557,10
0,7,7,19500000,1994,4238,4601,10
0,1,1,1000000,1912,612,1634,18
0,0,1,485000,1902,310,1565,23
0,3,2,7350000,1999,2075,3542,23
0,1,1,1200000,1969,600,2000,23
0,2,2,4250000,2011,1504,2826,23
0,2,2,3600000,1960,1340,2687,24
0,2,2,3600000,1960,1340,2687,24
0,1,1,525000,1924,700,750,24
0,0,1,545000,1900,387,1408,10
0,1,1,535000,1900,450,1189,10
0,4,4,5500000,1923,3200,1719,21
0,4,5,12000000,1939,3700,3243,22
0,1,2,1850000,2007,839,2205,23
0,1,1,535000,1900,450,1189,27
0,2,2,1350000,1931,1300,1038,36
0,1,1,779000,1902,700,1113,10
0,2,1,1475000,1973,971,1519,10
0,2,2,1385000,1971,962,1440,10
0,1,1,779000,1902,700,1113,13
0,1,1,649000,1929,800,811,25
0,0,1,725000,1960,600,1208,26
0,2,1,925000,1941,790,1171,27
0,1,1,965000,1961,787,1226,27
0,2,1,1250000,1922,1145,1092,30
0,1,1,650000,1907,720,903,32
0,2,1,385000,1999,820,470,8
0,2,2,1695000,2012,1051,1613,16
0,1,1,600000,1899,1060,566,8
0,2,1,910000,1899,1060,858,8
0,8,7,2300000,1910,4180,550,9
0,1,1,600000,1899,1060,566,10
0,2,1,910000,1899,1060,858,10
0,2,2,1599000,1973,1400,1142,10
0,2,2,4625000,1987,1695,2729,10
0,0,1,325000,1910,375,867,11
0,2,2,1599000,1973,1400,1142,14
0,2,2,1965000,2005,1330,1477,16
0,1,1,735000,1928,800,919,22
0,2,2,4625000,1987,1695,2729,29
0,1,1,749000,2011,762,983,10
0,1,1,499999,2011,669,747,35
0,1,1,749000,2011,762,983,35
0,2,2,904000,1920,1503,601,10
0,2,2,904000,1920,1503,601,10
0,2,1,559900,1925,1200,467,51
0,2,1,545000,1939,1049,520,39
0,2,1,365000,1925,700,521,73
0,2,1,365000,1925,700,521,73
0,1,1,935000,1910,1102,848,8
0,0,1,820000,2013,533,1538,8
0,0,1,835000,2013,501,1667,8
0,1,1,935000,1910,1102,848,10
0,1,1,1420000,2004,768,1849,10
0,1,1,1550000,2004,794,1952,10
0,2,2,1635000,2004,957,1708,10
0,1,1,1550000,2004,794,1952,11
0,1,1,1780000,2007,988,1802,14
0,2,2,2800000,2007,1308,2141,14
0,1,1,411500,1921,586,702,6
0,2,2,2175000,1999,1569,1386,10
0,2,2,1995000,1996,1044,1911,10
0,2,2,2235000,1999,1548,1444,14
0,4,4,8800000,1941,3382,2602,15
0,2,2,1850000,1996,1044,1772,17
0,2,2,1995000,1996,1044,1911,17
0,1,1,1695000,1927,680,2493,17
0,1,1,1495000,1962,1125,1329,19
0,2,2,2200000,2000,1044,2107,2
0,0.5,1,384900,1962,540,713,10
0,1,1,515000,1962,725,710,10
0,3,2,1950000,1956,1600,1219,10
0,0,1,307000,1910,330,930,15
0,2,2,1735000,1980,1585,1095,18
0,3,3,1850000,2005,1353,1367,8
0,3,3,1850000,2005,1353,1367,10
0,2,2,1995000,1986,1406,1419,14
0,2,2,2900000,1987,1600,1813,24
0,2,2,2499000,2004,1658,1507,24
0,2,2,1575000,1930,1324,1190,33
0,3,3,1495000,1990,1360,1099,0
0,1,1,529000,1986,650,814,0
0,3,3,2695000,2006,1991,1354,1
0,3,3,2695000,2006,1991,1354,1
0,4,3,3895000,2001,2277,1711,0
1,1,1,550000,1982,724,760,24
1,2,2,849000,1982,1030,824,24
1,3,2,1750000,1900,2950,593,26
1,1,1,799000,2008,847,943,29
1,1,1.5,899000,1997,1453,619,3
1,1,1,598000,2005,534,1120,5
1,1,2,1088000,1998,1086,1002,6
1,1,1,798000,1926,769,1038,10
1,1,1,798000,1926,769,1038,10
1,1,1,1495000,1927,2275,657,10
1,4,4,4300000,2006,3321,1295,10
1,1,1,699000,2008,756,925,12
1,2,2,334905,2000,1047,320,12
1,1,1.5,849000,1996,1127,753,13
1,1,1.5,1365000,1996,1607,849,13
1,1,1,649000,2011,674,963,14
1,3,3,1245000,1907,1503,828,23
1,1,1,649000,1983,850,764,163
1,2,2,1700000,1987,1250,1360,11
1,2,2,1980000,2009,1469,1348,0
1,2,2,3600000,2009,1652,2179,0
1,2,3,4995000,2009,2230,2240,0
1,2,2,1298000,2008,1159,1120,0
1,2,2,2149000,2008,1317,1632,0
1,3,2,1995000,2006,1362,1465,3
1,2,2,1650000,1937,1640,1006,7
1,1,1,949000,2010,824,1152,8
1,1,1,187518,2000,670,280,12
1,3,2.5,1995000,2008,2354,847,23
1,3,2.5,1995000,2008,2354,847,23
1,2,2,2799000,2008,1328,2108,35
1,2,2.5,1050000,2000,1640,640,2
1,2,2,895000,2006,1113,804,3
1,1,1,998000,2006,872,1144,3
1,2,2,1659000,2000,1165,1424,4
1,1,1,788000,2004,903,873,4
1,2,2,1395000,2001,1334,1046,4
1,2,2,1299000,2004,1453,894,5
1,2,2.5,850000,2000,1136,748,5
1,3,2.5,2850000,2013,2075,1373,5
1,2,2,950000,2000,1258,755,6
1,2,2,1600000,2002,1173,1364,7
1,3,2,1275000,2001,1502,849,13
1,1,1.5,775000,2009,835,928,14
1,3,2,1399000,1892,1809,773,41
1,2,2,879000,1912,950,925,53
1,1,1,699000,1907,932,750,59
1,1,1,985000,1978,884,1114,11
1,1,1,725000,1978,1063,682,60
1,2,2,849000,1911,1100,772,66
1,1,1,618000,1973,705,877,72
1,2,2,5950000,1989,3700,1608,83
1,4,4,1800000,1948,2475,727,10
1,2,2.5,2350000,2008,1314,1788,19
1,1,1,740200,1920,989,748,41
1,1,1,699000,1982,780,896,51
1,2,2,958000,2005,915,1047,54
1,2,2,1200000,1909,1302,922,56
1,3,3,1575000,1993,2233,705,62
1,2,2,1199000,1964,1100,1090,68
1,1,1,529000,1966,791,669,69
1,1,1,699000,1984,880,794,73
1,3,2,1295000,1926,1675,773,77
1,4,4,1800000,1948,2347,767,87
1,1,1,795000,2000,990,803,7
1,2,2,779000,2007,808,964,7
1,1,1.5,378551,2000,1000,379,9
1,4,2.3,1398000,1904,2492,561,13
1,2,1.3,1097000,1904,1493,735,14
1,1,1,670000,2014,507,1321,20
1,2,2,1199000,2014,943,1271,20
1,2,2,1095000,2010,1135,965,30
1,1,2,795000,2014,616,1291,31
1,3,2,979000,1900,1440,680,33
1,3,3,1488888,1977,2100,709,38
1,3,2,1389000,1908,1546,898,55
1,3,2,1389000,1908,1546,898,55
1,4,2,1099000,1962,1267,867,69
1,3,3.5,1799000,1900,2449,735,75
1,3,2,1250000,1924,1450,862,83
1,3,2,998000,1907,1464,682,84
1,3,2.5,1300000,1975,1642,792,4
1,1,1,539000,2000,709,760,5
1,1,1,735000,1983,779,944,12
1,3,2,688000,1942,1441,477,34
1,2,1,859000,1890,1100,781,44
1,3,2,699000,1964,1250,559,48
1,2,1,750000,1924,980,765,48
1,3,2,549000,1915,1972,278,51
1,6,3,1600000,1926,2567,623,52
1,2,2,875000,1909,1700,515,54
1,4,3,900000,2003,1870,481,54
1,3,2,748000,1930,1522,491,54
1,2,1.5,749000,1949,1348,556,56
1,2,1,599000,1942,707,847,61
1,3,1,648800,1940,1325,490,66
1,2,2,649000,1924,1320,492,72
1,2,1,798888,1929,1025,779,74
1,5,2,899000,1972,1940,463,81
1,3,1.5,699000,1910,1625,430,94
1,3,2,749000,1907,1513,495,95
1,3,2,535000,1967,1824,293,102
1,2,1,699000,1913,900,777,108
1,2,1,699000,1913,900,777,108
1,2,1,500000,1958,1166,429,136
1,3,2.5,879000,1981,2111,416,139
1,3,2.5,879000,1981,2111,416,139
1,4,2.5,699900,1982,1752,399,140
1,2,1,688000,1950,1145,601,143
1,3,2,1195000,1907,1396,856,33
1,3,2,1195000,1907,1396,856,33
1,1,1,697000,1922,694,1004,50
1,5,3.5,3995000,1905,3350,1193,59
1,1,1,650000,1955,600,1083,74
1,1,1,650000,1955,600,1083,74
1,1,1,775000,1955,600,1292,74
1,1,1,825000,1955,600,1375,74
1,2,1,699000,1907,915,764,89
1,3,2,1495000,1927,1520,984,105
1,4,4.5,5200000,1952,4813,1080,238
1,2,2,989000,1984,988,1001,46
1,3,1,1095000,1895,1465,747,68
1,1,1,725000,1900,811,894,70
1,1,1,725000,1900,811,894,70
1,4,4,2650000,1900,3816,694,75
1,2,2,1680000,1962,1850,908,89
1,1,1,1199000,1906,1139,1053,91
1,3,2.5,799000,1947,1400,571,35
1,3,2,795000,1954,1350,589,36
1,2,1,699000,1944,995,703,41
1,3,2,848000,1940,1500,565,42
1,3,2,899000,1948,1665,540,52
1,3,2,899000,1948,1665,540,52
1,6,4,1198000,1937,1965,610,79
1,3,3,1100000,1925,2633,418,91
1,6,3.5,949000,1918,2473,384,97
1,5,5,2990000,2014,5000,598,108
1,4,3,1338800,1940,2330,575,119
1,3,2.5,1388000,1928,1905,729,160
1,3,2,989000,1940,1603,617,227
1,3,1,1295000,1890,1772,731,65
1,6,6,6895000,1902,7800,884,67
1,2,2,1725000,1922,1415,1219,86
1,3,3,1995000,1922,1915,1042,88
1,0,1,499000,1900,510,978,91
1,0,1,499000,1900,510,978,91
1,1,1,599000,1900,624,960,91
1,1,1,599000,1900,624,960,91
1,3,2,1200000,1929,1284,935,121
1,3,2,1395000,1909,1877,743,57
1,3,3,1785000,1925,1970,906,58
1,3,2,1099000,1905,1457,754,67
1,5,5.5,3495000,1921,4310,811,73
1,3,2,2395000,1929,2323,1031,73
1,4,2.5,2549000,1907,2746,928,75
1,3,2,995000,2000,1393,714,75
1,5,3.5,6495000,1906,4609,1409,89
1,4,3,1698000,1890,1789,949,102
1,5,2,6298000,1914,3585,1757,26
1,3,3,1139000,2008,1532,743,44
1,3,2,1080000,1914,1954,553,49
1,4,1,1050000,1932,1767,594,55
1,1,1,739900,1941,875,846,70
1,4,4,1595000,1925,2750,580,81
1,2,1,699000,1907,1200,583,14
1,2,1,799000,1938,1150,695,36
1,4,2.5,995000,1915,2180,456,62
1,2,1,829000,1925,1145,724,71
1,2,1,875000,1908,1158,756,84
1,5,3,950000,1939,1846,515,97
1,2,1,749000,1936,1450,517,110
1,2,1,749000,1936,1450,517,110
1,6,6.5,9500000,1937,5420,1753,3
1,4,3,3595000,1931,3017,1192,9
1,2,1.5,1425000,1925,1360,1048,14
1,1,1,865000,1993,960,901,17
1,2,2.5,2495000,1940,1809,1379,48
1,2,2,2000000,1925,1518,1318,50
1,4,3.5,9895000,2008,6024,1643,62
1,3,2,358000,1989,1325,270,5
1,3,2.5,899000,2015,1391,646,5
1,3,2.5,929000,2015,1391,668,5
1,4,4,850000,1928,2470,344,8
1,2,1,648000,1921,1125,576,11
1,2,1,480000,1915,680,706,13
1,1,1,499000,1900,1076,464,19
1,4,2,689000,1951,1473,468,21
1,2,2,669000,1986,1317,508,23
1,2,1.5,729000,1942,1012,720,27
1,2,2,767000,1916,1380,556,31
1,3,1,660000,1900,1520,434,35
1,2,1,995000,1908,600,1658,36
1,2,1,759900,1941,1175,647,43
1,6,3.5,995000,2001,3080,323,55
1,2,1,725000,1945,1040,697,60
1,4,3,3420000,1926,5113,669,98
1,3,2,1650000,1922,2025,815,106
1,3,3.5,2250000,1928,3258,691,127
1,3,1,1319000,1925,1752,753,141
1,5,3.5,1698000,1966,2769,613,176
1,3,2,1049000,1947,1626,645,179
1,2,2,599000,1990,862,695,181
1,5,3.5,2995000,1947,3890,770,181
1,3,2,995000,1956,1305,762,216
1,1,1,350000,1908,600,583,43
1,2,1,550000,1908,800,688,43
1,4,4,3760000,1900,3085,1219,49
1,3,2,1050000,1922,1266,829,52
1,2,2,1895000,1907,1756,1079,54
1,1,1,599000,1961,680,881,56
1,4,3,1895000,2001,2041,928,61
1,3,2,1799000,1926,1800,999,66
1,2,1,600000,1908,1350,444,92
1,2,1,1495000,1908,1700,879,98
1,3,2,1595000,1961,1515,1053,103
1,3,2,849000,1947,1622,523,106
1,4,3.5,1995000,1992,3312,602,108
1,3,2,1495000,1937,1635,914,112
1,3,3.5,2195000,1922,2168,1012,125
1,4,2,1798000,1951,2050,877,131
1,2,2,849000,1978,1555,546,139
1,3,2,1159000,1977,1731,670,143
1,3,2.5,995000,1976,1959,508,163
1,4,3,1388000,1968,2275,610,163
1,5,3.5,2250000,1962,3729,603,174
1,3,2,1080000,1989,1524,709,185
1,3,2.5,1095000,1968,1868,586,187
1,2,1,599000,1972,990,605,189
1,2,1,915000,1954,1251,731,24
1,2,1,915000,1954,1251,731,24
1,3,2,725000,1975,1474,492,34
1,3,2.5,1588000,2015,2001,794,43
1,2,1,795000,1941,1256,633,63
1,2,1,795000,1941,1256,633,63
1,4,2,848000,1949,1646,515,69
1,1,1,439000,2002,667,658,80
1,3,2,849900,1958,1310,649,118
1,2,1,599000,1941,1254,478,123
1,3,2.5,1539514,2014,2024,761,136
1,3,2.5,1339000,2015,2133,628,143
1,3,2.5,1294000,2015,2133,607,143
1,3,2.5,1611000,2015,2001,805,153
1,2,2,1495000,1913,1174,1273,35
1,1,1,699000,1908,750,932,36
1,2,2.5,3495000,1900,1968,1776,76
1,4,2,699000,1949,1550,451,11
1,2,1,699000,1949,1050,666,64
1,3,3,888000,1975,1555,571,79
1,1,1,599000,1945,631,949,84
1,3,3,758000,1989,2157,351,90
1,2,2,1698000,2008,1620,1048,1
1,2,2,1698000,2008,1620,1048,1
1,1,1,849000,2012,886,958,2
1,2,2,1675000,2012,1562,1072,2
1,2,2,1695000,2007,1610,1053,2
1,3,2,2219000,2012,1921,1155,13
1,1,1,788000,2004,903,873,4
1,2,2,1950000,1995,1930,1010,4
1,0,1,539000,2000,709,760,5
1,2,2,849000,1982,1030,824,24
1,2,2.5,2495000,1940,1809,1379,48
1,4,4,3760000,1894,3085,1219,49
1,3,2,1799000,1926,1800,999,66
1,5,2.5,1800000,1890,3073,586,76
1,2,1,695000,1923,1045,665,106
1,3,2,1650000,1922,1483,1113,106
1,1,1,649000,1983,850,764,163
1,3,2,995000,1956,1305,762,216
1 # This dataset was collected for A Visual Introduction to Machine Learning (http://www.r2d3.us). It is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). We hope it helps you practice different data analysis and visualization techniques. ONE REQUEST: Please do not use this data to make any conclusions about the New York or San Francisco real estate markets. This data was collected with learning, not inference, in mind. :-)
2 #
3 in_sf,beds,bath,price,year_built,sqft,price_per_sqft,elevation
4 0,2,1,999000,1960,1000,999,10
5 0,2,2,2750000,2006,1418,1939,0
6 0,2,2,1350000,1900,2150,628,9
7 0,1,1,629000,1903,500,1258,9
8 0,0,1,439000,1930,500,878,10
9 0,0,1,439000,1930,500,878,10
10 0,1,1,475000,1920,500,950,10
11 0,1,1,975000,1930,900,1083,10
12 0,1,1,975000,1930,900,1083,12
13 0,2,1,1895000,1921,1000,1895,12
14 0,3,3,2095000,1926,2200,952,4
15 0,1,1,999000,1982,784,1274,5
16 0,1,1,999000,1982,784,1274,5
17 0,1,1,1249000,1987,826,1512,3
18 0,0,1,1110000,2008,698,1590,5
19 0,2,2,2059500,2008,1373,1500,5
20 0,2,2,2000000,1928,1200,1667,10
21 0,1,1,715000,1903,557,1284,3
22 0,2,2,2498000,2005,1260,1983,2
23 0,2,1.5,2650000,1915,2500,1060,5
24 0,2,2,3450000,1900,1850,1865,9
25 0,1,1,3105000,2016,1108,2802,10
26 0,4,5,13750000,2016,3699,3717,10
27 0,2,2,1185000,1900,1000,1185,4
28 0,1,2,1699000,1900,1500,1133,5
29 0,1,1,1195000,1900,1093,1093,6
30 0,1,1,450000,1920,500,900,10
31 0,1,1,1195000,1900,1093,1093,10
32 0,2,2,1185000,1900,1000,1185,10
33 0,0,1,625000,1964,800,781,8
34 0,2,2,3159990,2012,1420,2225,8
35 0,2,2,2200000,1920,2050,1073,10
36 0,2,2,4895000,2016,1520,3220,10
37 0,2,2,6525000,2016,2018,3233,10
38 0,2,2,4895000,2016,1520,3220,11
39 0,2,2,6525000,2016,2018,3233,11
40 0,2,2,1675000,2006,1225,1367,12
41 0,2,2,1675000,2006,1225,1367,12
42 0,2,1,999000,1985,799,1250,5
43 0,1,1,1550000,1926,1000,1550,6
44 0,1,1,1595000,2015,714,2234,7
45 0,2,2,3995000,1906,2400,1665,9
46 0,1,1,1285000,2012,749,1716,10
47 0,1,1,1595000,2015,714,2234,10
48 0,2,2,3995000,1906,2400,1665,10
49 0,2,2,4995000,1900,2024,2468,8
50 0,3,2,3580000,1880,3000,1193,10
51 0,3,3,6350000,2015,2500,2540,3
52 0,3,3,6550000,2015,2500,2620,3
53 0,3,3,5985000,1909,3300,1814,3
54 0,3,3,9900000,2015,2950,3356,4
55 0,1,1,1849000,1920,1400,1321,5
56 0,2,2,3500000,1915,2000,1750,5
57 0,2,2,3500000,1910,1887,1855,5
58 0,1,1,1250000,2007,720,1736,6
59 0,2,1,899000,1990,820,1096,7
60 0,1,2,1950000,1900,1300,1500,7
61 0,1,1,1750000,1963,1000,1750,5
62 0,0,1,775000,2009,546,1419,6
63 0,0,1,390000,1955,550,709,6
64 0,0,1,699999,1988,500,1400,9
65 0,1,1,649000,1965,750,865,10
66 0,2,2,1200000,1964,1200,1000,10
67 0,0,1,319000,1941,500,638,10
68 0,0,1,699999,1988,500,1400,10
69 0,0,1,775000,2009,546,1419,10
70 0,1,1,649000,1965,750,865,10
71 0,2,2,1849000,2009,1135,1629,10
72 0,1,1,615000,1960,750,820,11
73 0,1,1,1590000,2008,785,2025,12
74 0,1,1,1150000,2014,645,1783,18
75 0,2,2,2650000,2014,1240,2137,18
76 0,0,1,469000,1932,500,938,18
77 0,0,1,569000,1962,543,1048,8
78 0,1,1,549000,1924,750,732,10
79 0,2,2,799000,1928,850,940,19
80 0,2,2,879000,1928,800,1099,22
81 0,2,3,2999000,1925,3200,937,10
82 0,2,3,2999000,1925,3200,937,10
83 0,2,3,2999000,1925,3200,937,12
84 0,2,3,2999000,1925,3200,937,12
85 0,1,1,399000,1910,475,840,10
86 0,1,1,1135000,2005,715,1587,10
87 0,1,1,1145000,2005,700,1636,10
88 0,1,1,1760000,1969,950,1853,10
89 0,2,2,1725000,2005,976,1767,10
90 0,2,2,2750000,2007,1384,1987,10
91 0,2,2,2750000,2007,1384,1987,13
92 0,1,1,399000,1910,475,840,14
93 0,2,2,1725000,2005,1144,1508,16
94 0,1,1,869000,1988,670,1297,16
95 0,1,1,1760000,1969,950,1853,17
96 0,3,3,1249000,1962,1500,833,18
97 0,3,3,1500000,1962,1600,938,18
98 0,1,1,1135000,2005,715,1587,19
99 0,1,1,1145000,2005,700,1636,19
100 0,2,2,1725000,2005,976,1767,19
101 0,2,2,3500000,1925,1550,2258,20
102 0,0,1,525000,1940,525,1000,21
103 0,2,2,2025000,1940,1433,1413,21
104 0,2,2,3500000,1986,1463,2392,23
105 0,0,1,925000,1978,585,1581,24
106 0,2,2,1700000,1982,1007,1688,25
107 0,2,2,1700000,1982,1007,1688,25
108 0,0,1,449000,1962,550,816,10
109 0,0,1,299000,1930,400,748,12
110 0,1,1,695000,1961,720,965,16
111 0,1,1,695000,1961,720,965,16
112 0,4,5,17750000,2012,4476,3966,20
113 0,5,5,27500000,1930,7500,3667,21
114 0,0,1,539000,1957,485,1111,10
115 0,0,1,779000,1975,512,1521,10
116 0,1,1,925000,1985,745,1242,10
117 0,1,1,1319000,1958,1000,1319,10
118 0,2,2,4240000,2016,1741,2435,10
119 0,2,2,4285000,2016,1747,2453,10
120 0,3,3,4875000,2016,2017,2417,10
121 0,3,3,14950000,1931,4435,3371,10
122 0,3,3,10225000,2016,3007,3400,10
123 0,4,4,13400000,2016,3331,4023,10
124 0,5,5,19000000,2016,4972,3821,10
125 0,0,1,539000,1957,485,1111,11
126 0,1,1,835000,1963,700,1193,12
127 0,10,10,7995000,1910,6400,1249,12
128 0,0,1,349000,1960,400,873,13
129 0,1,1,588000,1970,600,980,14
130 0,2,2,8690000,2002,2178,3990,15
131 0,2,2,4195000,2016,1750,2397,15
132 0,2,2,4195000,2016,1742,2408,15
133 0,2,2,4240000,2016,1741,2435,15
134 0,2,2,4285000,2016,1747,2453,15
135 0,3,3,4875000,2016,2017,2417,15
136 0,3,3,5750000,2016,2196,2618,15
137 0,3,3,9350000,2016,3054,3062,15
138 0,3,3,10225000,2016,3007,3400,15
139 0,4,4,13150000,2016,3338,3939,15
140 0,4,4,13400000,2016,3331,4023,15
141 0,5,5,19000000,2016,4972,3821,15
142 0,0,1,850000,1924,546,1557,10
143 0,7,7,19500000,1994,4238,4601,10
144 0,1,1,1000000,1912,612,1634,18
145 0,0,1,485000,1902,310,1565,23
146 0,3,2,7350000,1999,2075,3542,23
147 0,1,1,1200000,1969,600,2000,23
148 0,2,2,4250000,2011,1504,2826,23
149 0,2,2,3600000,1960,1340,2687,24
150 0,2,2,3600000,1960,1340,2687,24
151 0,1,1,525000,1924,700,750,24
152 0,0,1,545000,1900,387,1408,10
153 0,1,1,535000,1900,450,1189,10
154 0,4,4,5500000,1923,3200,1719,21
155 0,4,5,12000000,1939,3700,3243,22
156 0,1,2,1850000,2007,839,2205,23
157 0,1,1,535000,1900,450,1189,27
158 0,2,2,1350000,1931,1300,1038,36
159 0,1,1,779000,1902,700,1113,10
160 0,2,1,1475000,1973,971,1519,10
161 0,2,2,1385000,1971,962,1440,10
162 0,1,1,779000,1902,700,1113,13
163 0,1,1,649000,1929,800,811,25
164 0,0,1,725000,1960,600,1208,26
165 0,2,1,925000,1941,790,1171,27
166 0,1,1,965000,1961,787,1226,27
167 0,2,1,1250000,1922,1145,1092,30
168 0,1,1,650000,1907,720,903,32
169 0,2,1,385000,1999,820,470,8
170 0,2,2,1695000,2012,1051,1613,16
171 0,1,1,600000,1899,1060,566,8
172 0,2,1,910000,1899,1060,858,8
173 0,8,7,2300000,1910,4180,550,9
174 0,1,1,600000,1899,1060,566,10
175 0,2,1,910000,1899,1060,858,10
176 0,2,2,1599000,1973,1400,1142,10
177 0,2,2,4625000,1987,1695,2729,10
178 0,0,1,325000,1910,375,867,11
179 0,2,2,1599000,1973,1400,1142,14
180 0,2,2,1965000,2005,1330,1477,16
181 0,1,1,735000,1928,800,919,22
182 0,2,2,4625000,1987,1695,2729,29
183 0,1,1,749000,2011,762,983,10
184 0,1,1,499999,2011,669,747,35
185 0,1,1,749000,2011,762,983,35
186 0,2,2,904000,1920,1503,601,10
187 0,2,2,904000,1920,1503,601,10
188 0,2,1,559900,1925,1200,467,51
189 0,2,1,545000,1939,1049,520,39
190 0,2,1,365000,1925,700,521,73
191 0,2,1,365000,1925,700,521,73
192 0,1,1,935000,1910,1102,848,8
193 0,0,1,820000,2013,533,1538,8
194 0,0,1,835000,2013,501,1667,8
195 0,1,1,935000,1910,1102,848,10
196 0,1,1,1420000,2004,768,1849,10
197 0,1,1,1550000,2004,794,1952,10
198 0,2,2,1635000,2004,957,1708,10
199 0,1,1,1550000,2004,794,1952,11
200 0,1,1,1780000,2007,988,1802,14
201 0,2,2,2800000,2007,1308,2141,14
202 0,1,1,411500,1921,586,702,6
203 0,2,2,2175000,1999,1569,1386,10
204 0,2,2,1995000,1996,1044,1911,10
205 0,2,2,2235000,1999,1548,1444,14
206 0,4,4,8800000,1941,3382,2602,15
207 0,2,2,1850000,1996,1044,1772,17
208 0,2,2,1995000,1996,1044,1911,17
209 0,1,1,1695000,1927,680,2493,17
210 0,1,1,1495000,1962,1125,1329,19
211 0,2,2,2200000,2000,1044,2107,2
212 0,0.5,1,384900,1962,540,713,10
213 0,1,1,515000,1962,725,710,10
214 0,3,2,1950000,1956,1600,1219,10
215 0,0,1,307000,1910,330,930,15
216 0,2,2,1735000,1980,1585,1095,18
217 0,3,3,1850000,2005,1353,1367,8
218 0,3,3,1850000,2005,1353,1367,10
219 0,2,2,1995000,1986,1406,1419,14
220 0,2,2,2900000,1987,1600,1813,24
221 0,2,2,2499000,2004,1658,1507,24
222 0,2,2,1575000,1930,1324,1190,33
223 0,3,3,1495000,1990,1360,1099,0
224 0,1,1,529000,1986,650,814,0
225 0,3,3,2695000,2006,1991,1354,1
226 0,3,3,2695000,2006,1991,1354,1
227 0,4,3,3895000,2001,2277,1711,0
228 1,1,1,550000,1982,724,760,24
229 1,2,2,849000,1982,1030,824,24
230 1,3,2,1750000,1900,2950,593,26
231 1,1,1,799000,2008,847,943,29
232 1,1,1.5,899000,1997,1453,619,3
233 1,1,1,598000,2005,534,1120,5
234 1,1,2,1088000,1998,1086,1002,6
235 1,1,1,798000,1926,769,1038,10
236 1,1,1,798000,1926,769,1038,10
237 1,1,1,1495000,1927,2275,657,10
238 1,4,4,4300000,2006,3321,1295,10
239 1,1,1,699000,2008,756,925,12
240 1,2,2,334905,2000,1047,320,12
241 1,1,1.5,849000,1996,1127,753,13
242 1,1,1.5,1365000,1996,1607,849,13
243 1,1,1,649000,2011,674,963,14
244 1,3,3,1245000,1907,1503,828,23
245 1,1,1,649000,1983,850,764,163
246 1,2,2,1700000,1987,1250,1360,11
247 1,2,2,1980000,2009,1469,1348,0
248 1,2,2,3600000,2009,1652,2179,0
249 1,2,3,4995000,2009,2230,2240,0
250 1,2,2,1298000,2008,1159,1120,0
251 1,2,2,2149000,2008,1317,1632,0
252 1,3,2,1995000,2006,1362,1465,3
253 1,2,2,1650000,1937,1640,1006,7
254 1,1,1,949000,2010,824,1152,8
255 1,1,1,187518,2000,670,280,12
256 1,3,2.5,1995000,2008,2354,847,23
257 1,3,2.5,1995000,2008,2354,847,23
258 1,2,2,2799000,2008,1328,2108,35
259 1,2,2.5,1050000,2000,1640,640,2
260 1,2,2,895000,2006,1113,804,3
261 1,1,1,998000,2006,872,1144,3
262 1,2,2,1659000,2000,1165,1424,4
263 1,1,1,788000,2004,903,873,4
264 1,2,2,1395000,2001,1334,1046,4
265 1,2,2,1299000,2004,1453,894,5
266 1,2,2.5,850000,2000,1136,748,5
267 1,3,2.5,2850000,2013,2075,1373,5
268 1,2,2,950000,2000,1258,755,6
269 1,2,2,1600000,2002,1173,1364,7
270 1,3,2,1275000,2001,1502,849,13
271 1,1,1.5,775000,2009,835,928,14
272 1,3,2,1399000,1892,1809,773,41
273 1,2,2,879000,1912,950,925,53
274 1,1,1,699000,1907,932,750,59
275 1,1,1,985000,1978,884,1114,11
276 1,1,1,725000,1978,1063,682,60
277 1,2,2,849000,1911,1100,772,66
278 1,1,1,618000,1973,705,877,72
279 1,2,2,5950000,1989,3700,1608,83
280 1,4,4,1800000,1948,2475,727,10
281 1,2,2.5,2350000,2008,1314,1788,19
282 1,1,1,740200,1920,989,748,41
283 1,1,1,699000,1982,780,896,51
284 1,2,2,958000,2005,915,1047,54
285 1,2,2,1200000,1909,1302,922,56
286 1,3,3,1575000,1993,2233,705,62
287 1,2,2,1199000,1964,1100,1090,68
288 1,1,1,529000,1966,791,669,69
289 1,1,1,699000,1984,880,794,73
290 1,3,2,1295000,1926,1675,773,77
291 1,4,4,1800000,1948,2347,767,87
292 1,1,1,795000,2000,990,803,7
293 1,2,2,779000,2007,808,964,7
294 1,1,1.5,378551,2000,1000,379,9
295 1,4,2.3,1398000,1904,2492,561,13
296 1,2,1.3,1097000,1904,1493,735,14
297 1,1,1,670000,2014,507,1321,20
298 1,2,2,1199000,2014,943,1271,20
299 1,2,2,1095000,2010,1135,965,30
300 1,1,2,795000,2014,616,1291,31
301 1,3,2,979000,1900,1440,680,33
302 1,3,3,1488888,1977,2100,709,38
303 1,3,2,1389000,1908,1546,898,55
304 1,3,2,1389000,1908,1546,898,55
305 1,4,2,1099000,1962,1267,867,69
306 1,3,3.5,1799000,1900,2449,735,75
307 1,3,2,1250000,1924,1450,862,83
308 1,3,2,998000,1907,1464,682,84
309 1,3,2.5,1300000,1975,1642,792,4
310 1,1,1,539000,2000,709,760,5
311 1,1,1,735000,1983,779,944,12
312 1,3,2,688000,1942,1441,477,34
313 1,2,1,859000,1890,1100,781,44
314 1,3,2,699000,1964,1250,559,48
315 1,2,1,750000,1924,980,765,48
316 1,3,2,549000,1915,1972,278,51
317 1,6,3,1600000,1926,2567,623,52
318 1,2,2,875000,1909,1700,515,54
319 1,4,3,900000,2003,1870,481,54
320 1,3,2,748000,1930,1522,491,54
321 1,2,1.5,749000,1949,1348,556,56
322 1,2,1,599000,1942,707,847,61
323 1,3,1,648800,1940,1325,490,66
324 1,2,2,649000,1924,1320,492,72
325 1,2,1,798888,1929,1025,779,74
326 1,5,2,899000,1972,1940,463,81
327 1,3,1.5,699000,1910,1625,430,94
328 1,3,2,749000,1907,1513,495,95
329 1,3,2,535000,1967,1824,293,102
330 1,2,1,699000,1913,900,777,108
331 1,2,1,699000,1913,900,777,108
332 1,2,1,500000,1958,1166,429,136
333 1,3,2.5,879000,1981,2111,416,139
334 1,3,2.5,879000,1981,2111,416,139
335 1,4,2.5,699900,1982,1752,399,140
336 1,2,1,688000,1950,1145,601,143
337 1,3,2,1195000,1907,1396,856,33
338 1,3,2,1195000,1907,1396,856,33
339 1,1,1,697000,1922,694,1004,50
340 1,5,3.5,3995000,1905,3350,1193,59
341 1,1,1,650000,1955,600,1083,74
342 1,1,1,650000,1955,600,1083,74
343 1,1,1,775000,1955,600,1292,74
344 1,1,1,825000,1955,600,1375,74
345 1,2,1,699000,1907,915,764,89
346 1,3,2,1495000,1927,1520,984,105
347 1,4,4.5,5200000,1952,4813,1080,238
348 1,2,2,989000,1984,988,1001,46
349 1,3,1,1095000,1895,1465,747,68
350 1,1,1,725000,1900,811,894,70
351 1,1,1,725000,1900,811,894,70
352 1,4,4,2650000,1900,3816,694,75
353 1,2,2,1680000,1962,1850,908,89
354 1,1,1,1199000,1906,1139,1053,91
355 1,3,2.5,799000,1947,1400,571,35
356 1,3,2,795000,1954,1350,589,36
357 1,2,1,699000,1944,995,703,41
358 1,3,2,848000,1940,1500,565,42
359 1,3,2,899000,1948,1665,540,52
360 1,3,2,899000,1948,1665,540,52
361 1,6,4,1198000,1937,1965,610,79
362 1,3,3,1100000,1925,2633,418,91
363 1,6,3.5,949000,1918,2473,384,97
364 1,5,5,2990000,2014,5000,598,108
365 1,4,3,1338800,1940,2330,575,119
366 1,3,2.5,1388000,1928,1905,729,160
367 1,3,2,989000,1940,1603,617,227
368 1,3,1,1295000,1890,1772,731,65
369 1,6,6,6895000,1902,7800,884,67
370 1,2,2,1725000,1922,1415,1219,86
371 1,3,3,1995000,1922,1915,1042,88
372 1,0,1,499000,1900,510,978,91
373 1,0,1,499000,1900,510,978,91
374 1,1,1,599000,1900,624,960,91
375 1,1,1,599000,1900,624,960,91
376 1,3,2,1200000,1929,1284,935,121
377 1,3,2,1395000,1909,1877,743,57
378 1,3,3,1785000,1925,1970,906,58
379 1,3,2,1099000,1905,1457,754,67
380 1,5,5.5,3495000,1921,4310,811,73
381 1,3,2,2395000,1929,2323,1031,73
382 1,4,2.5,2549000,1907,2746,928,75
383 1,3,2,995000,2000,1393,714,75
384 1,5,3.5,6495000,1906,4609,1409,89
385 1,4,3,1698000,1890,1789,949,102
386 1,5,2,6298000,1914,3585,1757,26
387 1,3,3,1139000,2008,1532,743,44
388 1,3,2,1080000,1914,1954,553,49
389 1,4,1,1050000,1932,1767,594,55
390 1,1,1,739900,1941,875,846,70
391 1,4,4,1595000,1925,2750,580,81
392 1,2,1,699000,1907,1200,583,14
393 1,2,1,799000,1938,1150,695,36
394 1,4,2.5,995000,1915,2180,456,62
395 1,2,1,829000,1925,1145,724,71
396 1,2,1,875000,1908,1158,756,84
397 1,5,3,950000,1939,1846,515,97
398 1,2,1,749000,1936,1450,517,110
399 1,2,1,749000,1936,1450,517,110
400 1,6,6.5,9500000,1937,5420,1753,3
401 1,4,3,3595000,1931,3017,1192,9
402 1,2,1.5,1425000,1925,1360,1048,14
403 1,1,1,865000,1993,960,901,17
404 1,2,2.5,2495000,1940,1809,1379,48
405 1,2,2,2000000,1925,1518,1318,50
406 1,4,3.5,9895000,2008,6024,1643,62
407 1,3,2,358000,1989,1325,270,5
408 1,3,2.5,899000,2015,1391,646,5
409 1,3,2.5,929000,2015,1391,668,5
410 1,4,4,850000,1928,2470,344,8
411 1,2,1,648000,1921,1125,576,11
412 1,2,1,480000,1915,680,706,13
413 1,1,1,499000,1900,1076,464,19
414 1,4,2,689000,1951,1473,468,21
415 1,2,2,669000,1986,1317,508,23
416 1,2,1.5,729000,1942,1012,720,27
417 1,2,2,767000,1916,1380,556,31
418 1,3,1,660000,1900,1520,434,35
419 1,2,1,995000,1908,600,1658,36
420 1,2,1,759900,1941,1175,647,43
421 1,6,3.5,995000,2001,3080,323,55
422 1,2,1,725000,1945,1040,697,60
423 1,4,3,3420000,1926,5113,669,98
424 1,3,2,1650000,1922,2025,815,106
425 1,3,3.5,2250000,1928,3258,691,127
426 1,3,1,1319000,1925,1752,753,141
427 1,5,3.5,1698000,1966,2769,613,176
428 1,3,2,1049000,1947,1626,645,179
429 1,2,2,599000,1990,862,695,181
430 1,5,3.5,2995000,1947,3890,770,181
431 1,3,2,995000,1956,1305,762,216
432 1,1,1,350000,1908,600,583,43
433 1,2,1,550000,1908,800,688,43
434 1,4,4,3760000,1900,3085,1219,49
435 1,3,2,1050000,1922,1266,829,52
436 1,2,2,1895000,1907,1756,1079,54
437 1,1,1,599000,1961,680,881,56
438 1,4,3,1895000,2001,2041,928,61
439 1,3,2,1799000,1926,1800,999,66
440 1,2,1,600000,1908,1350,444,92
441 1,2,1,1495000,1908,1700,879,98
442 1,3,2,1595000,1961,1515,1053,103
443 1,3,2,849000,1947,1622,523,106
444 1,4,3.5,1995000,1992,3312,602,108
445 1,3,2,1495000,1937,1635,914,112
446 1,3,3.5,2195000,1922,2168,1012,125
447 1,4,2,1798000,1951,2050,877,131
448 1,2,2,849000,1978,1555,546,139
449 1,3,2,1159000,1977,1731,670,143
450 1,3,2.5,995000,1976,1959,508,163
451 1,4,3,1388000,1968,2275,610,163
452 1,5,3.5,2250000,1962,3729,603,174
453 1,3,2,1080000,1989,1524,709,185
454 1,3,2.5,1095000,1968,1868,586,187
455 1,2,1,599000,1972,990,605,189
456 1,2,1,915000,1954,1251,731,24
457 1,2,1,915000,1954,1251,731,24
458 1,3,2,725000,1975,1474,492,34
459 1,3,2.5,1588000,2015,2001,794,43
460 1,2,1,795000,1941,1256,633,63
461 1,2,1,795000,1941,1256,633,63
462 1,4,2,848000,1949,1646,515,69
463 1,1,1,439000,2002,667,658,80
464 1,3,2,849900,1958,1310,649,118
465 1,2,1,599000,1941,1254,478,123
466 1,3,2.5,1539514,2014,2024,761,136
467 1,3,2.5,1339000,2015,2133,628,143
468 1,3,2.5,1294000,2015,2133,607,143
469 1,3,2.5,1611000,2015,2001,805,153
470 1,2,2,1495000,1913,1174,1273,35
471 1,1,1,699000,1908,750,932,36
472 1,2,2.5,3495000,1900,1968,1776,76
473 1,4,2,699000,1949,1550,451,11
474 1,2,1,699000,1949,1050,666,64
475 1,3,3,888000,1975,1555,571,79
476 1,1,1,599000,1945,631,949,84
477 1,3,3,758000,1989,2157,351,90
478 1,2,2,1698000,2008,1620,1048,1
479 1,2,2,1698000,2008,1620,1048,1
480 1,1,1,849000,2012,886,958,2
481 1,2,2,1675000,2012,1562,1072,2
482 1,2,2,1695000,2007,1610,1053,2
483 1,3,2,2219000,2012,1921,1155,13
484 1,1,1,788000,2004,903,873,4
485 1,2,2,1950000,1995,1930,1010,4
486 1,0,1,539000,2000,709,760,5
487 1,2,2,849000,1982,1030,824,24
488 1,2,2.5,2495000,1940,1809,1379,48
489 1,4,4,3760000,1894,3085,1219,49
490 1,3,2,1799000,1926,1800,999,66
491 1,5,2.5,1800000,1890,3073,586,76
492 1,2,1,695000,1923,1045,665,106
493 1,3,2,1650000,1922,1483,1113,106
494 1,1,1,649000,1983,850,764,163
495 1,3,2,995000,1956,1305,762,216

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long