elbillaf
elbillaf

Reputation: 1984

How to correctly use pandas Series.map() with a dictionary of mappings?

The dataframe smaller_df looks like this:

> smaller_df.head()
   MSA Code  Line   RPP
0     10180   1.0  91.2
1     10180   2.0  97.4
2     10180   3.0  78.7
3     10180   4.0  93.5
4     10420   1.0  90.4
...

smaller_df.dtypes results in

MSA Code      int64
Line        float64
RPP         float64
Wages        object
dtype: object

wage_keys.head() gives:

   MSA Code  Average Wage
0     11260  94490.000000
1     21820  72080.000000
2     10180  71128.571429
3     13820  87338.396624
4     10420  76620.000000
...

wage_keys.dtypes is:

MSA Code          int64
Average Wage    float64
dtype: object

Note that the same 'MSA Code" can appear multiple times in smaller_df, but exactly once in wage_keys.

I want the new column 'Wages' in smaller_df to be set to the corresponding value in wage_keys.

So the new table should look like this:

   MSA Code  Line   RPP Wages
0     10180   1.0  91.2   71128.571429
1     10180   2.0  97.4   71128.571429
2     10180   3.0  78.7   71128.571429
3     10180   4.0  93.5   71128.571429
4     10420   1.0  90.4   76620.000000
...

I have the following code to do the mapping, by making a dictionary of wages:

wages = wage_keys.set_index('MSA Code').to_dict()
smaller_df['Wages'] = smaller_df['MSA Code'].map(wages)

The problem is this results in:

   MSA Code  Line   RPP Wages
0     10180   1.0  91.2   NaN
1     10180   2.0  97.4   NaN
2     10180   3.0  78.7   NaN
3     10180   4.0  93.5   NaN
4     10420   1.0  90.4   NaN

Obviously, I'm missing something. How do get the values for the 'Wages' column to set to the correct corresponding value in the wages dictionary (or wage_key dataframe)?

Upvotes: 1

Views: 68

Answers (1)

cs95
cs95

Reputation: 402253

Your mistake was in the conversion to dictionary. You did,

df2.set_index('MSA Code').to_dict()
# {
#     "Average Wage": {
#         "10180": 71128.571429,
#         "10420": 76620.0,
#         "11260": 94490.0,
#         "13820": 87338.396624,
#         "21820": 72080.0
#     }
# }

This results in a dict-of-dicts. What you should've done was,

df2.set_index('MSA Code')['Average Wage'].to_dict()
# {11260: 94490.0, 21820: 72080.0, 10180: 71128.571429, 13820: 87338.396624, 10420: 76620.0}

Or,

df2.set_index('MSA Code')['Average Wage']

MSA Code
11260    94490.000000
21820    72080.000000
10180    71128.571429
13820    87338.396624
10420    76620.000000
Name: Average Wage, dtype: float64

Both of which results in a mapping format that map understandings. Now your map call produces the expected output:

df['Wages'] = df['MSA Code'].map(df2.set_index('MSA Code')['Average Wage'])
df

   MSA Code  Line   RPP         Wages
0     10180   1.0  91.2  71128.571429
1     10180   2.0  97.4  71128.571429
2     10180   3.0  78.7  71128.571429
3     10180   4.0  93.5  71128.571429
4     10420   1.0  90.4  76620.000000

Upvotes: 1

Related Questions