Reputation: 2978
I have the following two pandas data frames df1
and df2
. Both of them have datetime column:
import pandas as pd
import numpy as np
import random
np.random.seed(0)
rng = pd.date_range('2015-02-24', periods=5, freq='T')
list=[]
for i in range(len(rng)):
r=random.randint(1,100)
if r not in list: list.append(str(r))
df1 = pd.DataFrame({ 'Date': rng, 'Id': list, 'Val1': np.random.randn(len(rng))})
df1.head()
Date Id Val1
2015-02-24 00:00:00 96 -0.968980
2015-02-24 00:01:00 31 0.591243
2015-02-24 00:02:00 58 -0.782776
2015-02-24 00:03:00 81 -0.444233
2015-02-24 00:04:00 73 -0.345186
df2 = pd.DataFrame({ 'Date': rng, 'Id': list, 'Val2': np.random.randn(len(rng))})
df2.head()
Then I want to merge them as shown below:
df3 = df1[['Id', 'Date']]\
.groupby('Id')\
.agg('max')\
.merge(df2, on=['Id', 'Date'], how='left')
But I get the error:
ValueError: You are trying to merge on datetime64[ns, UTC] and object columns. If you wish to proceed you should use pd.concat
Expected output:
Date Id Val2
2015-02-24 00:00:00 96 2.965560
2015-02-24 00:01:00 31 5.593345
2015-02-24 00:02:00 58 11.78276
2015-02-24 00:03:00 81 -0.444234
2015-02-24 00:04:00 73 -0.322334
Please consider that this is a simplified reproduction of my real code. In my real code I only have merge
operation that fails with the same message. The rest of the code is used to reproduce the needed data formats.
Upvotes: 3
Views: 5499
Reputation: 2015
Try:
df3 = df1[['Id', 'Date']].groupby('Id').agg('max')
Then:
df3 = pd.merge(df3, df2)
Produces:
>>> df3
Date Id Val2
0 2015-02-24 00:03:00 12 -0.103219
1 2015-02-24 00:04:00 24 0.410599
2 2015-02-24 00:02:00 32 -0.151357
3 2015-02-24 00:01:00 41 0.950088
4 2015-02-24 00:00:00 46 -0.977278
Upvotes: 2