Reputation: 1984
The dataframe smaller_df looks like this:
> smaller_df.head()
MSA Code Line RPP
0 10180 1.0 91.2
1 10180 2.0 97.4
2 10180 3.0 78.7
3 10180 4.0 93.5
4 10420 1.0 90.4
...
smaller_df.dtypes results in
MSA Code int64
Line float64
RPP float64
Wages object
dtype: object
wage_keys.head() gives:
MSA Code Average Wage
0 11260 94490.000000
1 21820 72080.000000
2 10180 71128.571429
3 13820 87338.396624
4 10420 76620.000000
...
wage_keys.dtypes is:
MSA Code int64
Average Wage float64
dtype: object
Note that the same 'MSA Code" can appear multiple times in smaller_df, but exactly once in wage_keys.
I want the new column 'Wages' in smaller_df to be set to the corresponding value in wage_keys.
So the new table should look like this:
MSA Code Line RPP Wages
0 10180 1.0 91.2 71128.571429
1 10180 2.0 97.4 71128.571429
2 10180 3.0 78.7 71128.571429
3 10180 4.0 93.5 71128.571429
4 10420 1.0 90.4 76620.000000
...
I have the following code to do the mapping, by making a dictionary of wages:
wages = wage_keys.set_index('MSA Code').to_dict()
smaller_df['Wages'] = smaller_df['MSA Code'].map(wages)
The problem is this results in:
MSA Code Line RPP Wages
0 10180 1.0 91.2 NaN
1 10180 2.0 97.4 NaN
2 10180 3.0 78.7 NaN
3 10180 4.0 93.5 NaN
4 10420 1.0 90.4 NaN
Obviously, I'm missing something. How do get the values for the 'Wages' column to set to the correct corresponding value in the wages dictionary (or wage_key dataframe)?
Upvotes: 1
Views: 68
Reputation: 402253
Your mistake was in the conversion to dictionary. You did,
df2.set_index('MSA Code').to_dict()
# {
# "Average Wage": {
# "10180": 71128.571429,
# "10420": 76620.0,
# "11260": 94490.0,
# "13820": 87338.396624,
# "21820": 72080.0
# }
# }
This results in a dict-of-dicts. What you should've done was,
df2.set_index('MSA Code')['Average Wage'].to_dict()
# {11260: 94490.0, 21820: 72080.0, 10180: 71128.571429, 13820: 87338.396624, 10420: 76620.0}
Or,
df2.set_index('MSA Code')['Average Wage']
MSA Code
11260 94490.000000
21820 72080.000000
10180 71128.571429
13820 87338.396624
10420 76620.000000
Name: Average Wage, dtype: float64
Both of which results in a mapping format that map
understandings. Now your map
call produces the expected output:
df['Wages'] = df['MSA Code'].map(df2.set_index('MSA Code')['Average Wage'])
df
MSA Code Line RPP Wages
0 10180 1.0 91.2 71128.571429
1 10180 2.0 97.4 71128.571429
2 10180 3.0 78.7 71128.571429
3 10180 4.0 93.5 71128.571429
4 10420 1.0 90.4 76620.000000
Upvotes: 1