How to match two dataframes, and add value from on dataframe to the other?

Question

I have a training dataset with 87k data points. Looks like this.

index street_name    street_name_categorical
0     Nyhavn           1
1     Nyhavn           1
2     Gotgade          2
3     Botgade          3
4     Marmorgade       4
...

87k rows in total.

This is a new dataset which I would like to test on, but add the categorical values from the training data to the new dataset.

New Dataset

index street_name
0     Nyhavn
1     Marmorgade
2     Gotgade
3     Totgade
4     Gotgade
...

1.4k rows in total.

Desired output would be in the new dataset:

index street_name    street_name_categorical
0     Nyhavn         1
1     Marmorgade     4
2     Gotgade        2
3     Totgade        NA
4     Gotgade        2
...

Should return 1.4k rows.

I tried the following lines of code, but it does not return the desired output.

street_name_cat_from_train = train[["street_name","street_name_cat"]]
merged_df = new_data.merge(street_name_cat_from_train, on = ['street_name'])
merged_df

Returns 197k rows.

Jeff · Accepted Answer

Since categories are unique, create a dictionary to map names to category values

dt = df.groupby('street_name').first()['street_name_categorical'].to_dict()

Map street names with the dt

df2['street_name_categorical'] = df2['street_name'].map(dt)

How to match two dataframes, and add value from on dataframe to the other?

Answers (1)

Related Questions