diff --git a/README.md b/README.md index 23623f7..03beeb6 100644 --- a/README.md +++ b/README.md @@ -1 +1,9 @@ -# unnamed_chatgpt_project \ No newline at end of file +# unnamed_chatgpt_project + + +## names + +had to get a bunch of names for this to work since I didn't want to generate these, I wanted these to be the input for the generation of the other attributes. +For the names I wanted a diverse mix of countries of origin. Initial google results were mostly from US statistics but I soon found this [stackexchange comment](https://opendata.stackexchange.com/a/5003) and thus used this [dataset](ftp://ftp.heise.de/pub/ct/listings/0717-182.zip) + +While looking through this dataset I found that apart from country and popularity statistics it also had information regarding the "possible" gender of the name which I could also use as part of the input when generating the attributes. Genders were defined from M (male), 1M (male if first part of then ame else mostly male), ?m (mostly male), F (female), 1F (see 1M), ?F (mostly female), ? (unisex)