Reputation: 783
I have a training dataset with 87k data points. Looks like this.
index street_name street_name_categorical
0 Nyhavn 1
1 Nyhavn 1
2 Gotgade 2
3 Botgade 3
4 Marmorgade 4
...
87k rows in total.
This is a new dataset which I would like to test on, but add the categorical values from the training data to the new dataset.
New Dataset
index street_name
0 Nyhavn
1 Marmorgade
2 Gotgade
3 Totgade
4 Gotgade
...
1.4k rows in total.
Desired output would be in the new dataset:
index street_name street_name_categorical
0 Nyhavn 1
1 Marmorgade 4
2 Gotgade 2
3 Totgade NA
4 Gotgade 2
...
Should return 1.4k rows.
I tried the following lines of code, but it does not return the desired output.
street_name_cat_from_train = train[["street_name","street_name_cat"]]
merged_df = new_data.merge(street_name_cat_from_train, on = ['street_name'])
merged_df
Returns 197k rows.
Upvotes: 0
Views: 34
Reputation: 634
Since categories are unique, create a dictionary to map names to category values
dt = df.groupby('street_name').first()['street_name_categorical'].to_dict()
Map street names with the dt
df2['street_name_categorical'] = df2['street_name'].map(dt)
Upvotes: 1