Jared
Jared

Reputation: 3892

Python Pandas - Reformat Datetime Index

I have two data frames that are both multi-indexed on 'Date' and 'name', and want to do a SQL style JOIN to combine them. I've tried

pd.merge(df1.reset_index(), df2.reset_index(), on=['name', 'Date'], how='inner')

which then results in an empty DataFrame.

If I inspect the data frames I can see that the index of one is represented as '2015-01-01' and the other is represented as '2015-01-01 00:00:00' which explains my issues with joining.

Is there a way to 'recast' the index to a specific format within pandas?

I've included the tables to see what data I'm working with.

df1=
+-------------+------+------+------+
|    Date     | name | col1 | col2 |
+-------------+------+------+------+
| 2015-01-01  | mary |   12 |  123 |
| 2015-01-02  | mary |   23 |   33 |
| 2015-01-03  | mary |   34 |   45 |
| 2015-01-01  | john |   65 |   76 |
| 2015-01-02  | john |   67 |   78 |
| 2015-01-03  | john |   25 |   86 |
+-------------+------+------+------+

df2= 
+------------+------+-------+-------+
|    Date    | name | col3  | col4  |
+------------+------+-------+-------+
| 2015-01-01 | mary | 80809 | 09885 |
| 2015-01-02 | mary | 53879 | 58972 |
| 2015-01-03 | mary | 23887 |  3908 |
| 2015-01-01 | john |  9238 |  2348 |
| 2015-01-02 | john |   234 |   234 |
| 2015-01-03 | john |  5325 |  6436 |
+------------+------+-------+-------+

DESIRED Result:

+-------------+------+------+-------+-------+-------+
|    Date     | name | col1 | col2  | col3  | col4  |
+-------------+------+------+-------+-------+-------+
| 2015-01-01  | mary |   12 |   123 | 80809 | 09885 |
| 2015-01-02  | mary |   23 |    33 | 53879 | 58972 |
| 2015-01-03  | mary |   34 |    45 | 23887 |  3908 |
| 2015-01-01  | john |   65 |    76 |  9238 |  2348 |
| 2015-01-02  | john |   67 |    78 |   234 |   234 |
| 2015-01-03  | john |   25 |    86 |  5325 |  6436 |
+-------------+------+------+-------+-------+-------+

Upvotes: 1

Views: 324

Answers (1)

firelynx
firelynx

Reputation: 32214

The reason you cannot join is because you have different dtypes on the indicies. Pandas silently fails if the indicies have different dtypes.

You can easily change your indicies from string representations of time to proper pandas datetimes like this:

df = pd.DataFrame({"data":range(1,30)}, index=['2015-04-{}'.format(d) for d in range(1,30)])

df.index.dtype
dtype('O')

df.index = df.index.to_series().apply(pd.to_datetime)

df.index.dtype
dtype('<M8[ns]')

Now you can merge the dataframes on their index:

pd.merge(left=df, left_index=True,
         right=df2, right_index=True)

Assuming you have a df2, which my example is omitting...

Upvotes: 2

Related Questions