metersk
metersk

Reputation: 12509

Key error on pandas merge (left join)

I have two dataframes below, df_purchase(1) and df_login(2)

+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|        | age | gender |    ttp     |       count        | sum(amount) |          region          |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|  49427 | 63  | M      | 824.731412 | 2                  | 25.00       | Omaha, Nebraska          |
|  28433 | 49  | M      | 1.166250   | 2                  | 41.94       | Catasauqua, Pennsylvania |
|   4162 | 29  | M      | 5.620949   | 2                  | 51.78       | Eagle Center, Iowa       |
|  18747 | 43  | M      | 153.502072 | 2                  | 23.84       | Pacific, Washington      |
|  45173 | 59  | M      | 0.027257   | 2                  | 13.98       | De Soto, Missouri        |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+

+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|        | age | gender | count      | region             |             |                          |
| 671766 | 84  | M      | 13900      | New York, New York |             |                          |
| 671166 | 84  | F      | 7619       | New York, New York |             |                          |
| 672209 | 85  | F      | 6483       | New York, New York |             |                          |
| 672671 | 85  | M      | 5808       | New York, New York |             |                          |
| 195201 | 34  | M      | 3817       | New York, New York |             |                          |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+

I am trying to join df_logins to df_purchase on age, gender and region with the following pandas code:

df = pd.merge(df_purchase, df_login[['count']],
                       how='left', on=['age', 'gender', 'region'])

However, I keep getting this error: KeyError: 'age' Any thoughts?

Upvotes: 11

Views: 18215

Answers (2)

In order to explain more on @EdChum's answer, you should change from "on" to "left_on" and "right_on" for better parsings as stated in the official document from Pandas:

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

enter image description here

Upvotes: 0

EdChum
EdChum

Reputation: 394031

The KeyError arises from this:

df = pd.merge(df_purchase, df_login[['count']] <- this selects just count column,
                       how='left', on=['age', 'gender', 'region'])

You've specifically selected just a single column from df_login, you need this:

df = pd.merge(df_purchase, df_login,
                       how='left', on=['age', 'gender', 'region'])

I'm assuming that this is not your complete data as you have no common values in the age and region column in df_login.

Upvotes: 11

Related Questions