Reputation: 12509
I have two dataframes below, df_purchase(1) and df_login(2)
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| | age | gender | ttp | count | sum(amount) | region |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| 49427 | 63 | M | 824.731412 | 2 | 25.00 | Omaha, Nebraska |
| 28433 | 49 | M | 1.166250 | 2 | 41.94 | Catasauqua, Pennsylvania |
| 4162 | 29 | M | 5.620949 | 2 | 51.78 | Eagle Center, Iowa |
| 18747 | 43 | M | 153.502072 | 2 | 23.84 | Pacific, Washington |
| 45173 | 59 | M | 0.027257 | 2 | 13.98 | De Soto, Missouri |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| | age | gender | count | region | | |
| 671766 | 84 | M | 13900 | New York, New York | | |
| 671166 | 84 | F | 7619 | New York, New York | | |
| 672209 | 85 | F | 6483 | New York, New York | | |
| 672671 | 85 | M | 5808 | New York, New York | | |
| 195201 | 34 | M | 3817 | New York, New York | | |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
I am trying to join df_logins to df_purchase on age, gender and region with the following pandas code:
df = pd.merge(df_purchase, df_login[['count']],
how='left', on=['age', 'gender', 'region'])
However, I keep getting this error: KeyError: 'age'
Any thoughts?
Upvotes: 11
Views: 18215
Reputation: 111
In order to explain more on @EdChum's answer, you should change from "on" to "left_on" and "right_on" for better parsings as stated in the official document from Pandas:
Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
Upvotes: 0
Reputation: 394031
The KeyError arises from this:
df = pd.merge(df_purchase, df_login[['count']] <- this selects just count column,
how='left', on=['age', 'gender', 'region'])
You've specifically selected just a single column from df_login
, you need this:
df = pd.merge(df_purchase, df_login,
how='left', on=['age', 'gender', 'region'])
I'm assuming that this is not your complete data as you have no common values in the age and region column in df_login
.
Upvotes: 11