Reputation: 3892
I have two data frames that are both multi-indexed on 'Date' and 'name', and want to do a SQL style JOIN
to combine them. I've tried
pd.merge(df1.reset_index(), df2.reset_index(), on=['name', 'Date'], how='inner')
which then results in an empty DataFrame.
If I inspect the data frames I can see that the index of one is represented as '2015-01-01'
and the other is represented as '2015-01-01 00:00:00'
which explains my issues with joining.
Is there a way to 'recast' the index to a specific format within pandas?
I've included the tables to see what data I'm working with.
df1=
+-------------+------+------+------+
| Date | name | col1 | col2 |
+-------------+------+------+------+
| 2015-01-01 | mary | 12 | 123 |
| 2015-01-02 | mary | 23 | 33 |
| 2015-01-03 | mary | 34 | 45 |
| 2015-01-01 | john | 65 | 76 |
| 2015-01-02 | john | 67 | 78 |
| 2015-01-03 | john | 25 | 86 |
+-------------+------+------+------+
df2=
+------------+------+-------+-------+
| Date | name | col3 | col4 |
+------------+------+-------+-------+
| 2015-01-01 | mary | 80809 | 09885 |
| 2015-01-02 | mary | 53879 | 58972 |
| 2015-01-03 | mary | 23887 | 3908 |
| 2015-01-01 | john | 9238 | 2348 |
| 2015-01-02 | john | 234 | 234 |
| 2015-01-03 | john | 5325 | 6436 |
+------------+------+-------+-------+
DESIRED Result:
+-------------+------+------+-------+-------+-------+
| Date | name | col1 | col2 | col3 | col4 |
+-------------+------+------+-------+-------+-------+
| 2015-01-01 | mary | 12 | 123 | 80809 | 09885 |
| 2015-01-02 | mary | 23 | 33 | 53879 | 58972 |
| 2015-01-03 | mary | 34 | 45 | 23887 | 3908 |
| 2015-01-01 | john | 65 | 76 | 9238 | 2348 |
| 2015-01-02 | john | 67 | 78 | 234 | 234 |
| 2015-01-03 | john | 25 | 86 | 5325 | 6436 |
+-------------+------+------+-------+-------+-------+
Upvotes: 1
Views: 324
Reputation: 32214
The reason you cannot join is because you have different dtypes on the indicies. Pandas silently fails if the indicies have different dtypes.
You can easily change your indicies from string representations of time to proper pandas datetimes like this:
df = pd.DataFrame({"data":range(1,30)}, index=['2015-04-{}'.format(d) for d in range(1,30)])
df.index.dtype
dtype('O')
df.index = df.index.to_series().apply(pd.to_datetime)
df.index.dtype
dtype('<M8[ns]')
Now you can merge the dataframes on their index:
pd.merge(left=df, left_index=True,
right=df2, right_index=True)
Assuming you have a df2, which my example is omitting...
Upvotes: 2