Fluxy
Fluxy

Reputation: 2978

ValueError: You are trying to merge on datetime64[ns, UTC] and object columns

I have the following two pandas data frames df1 and df2. Both of them have datetime column:

import pandas as pd
import numpy as np
import random

np.random.seed(0)
rng = pd.date_range('2015-02-24', periods=5, freq='T')

list=[]
for i in range(len(rng)):
    r=random.randint(1,100)
    if r not in list: list.append(str(r))

df1 = pd.DataFrame({ 'Date': rng, 'Id': list, 'Val1': np.random.randn(len(rng))})
df1.head()


                 Date  Id      Val1
  2015-02-24 00:00:00  96 -0.968980
  2015-02-24 00:01:00  31  0.591243
  2015-02-24 00:02:00  58 -0.782776
  2015-02-24 00:03:00  81 -0.444233
  2015-02-24 00:04:00  73 -0.345186

df2 = pd.DataFrame({ 'Date': rng, 'Id': list, 'Val2': np.random.randn(len(rng))})
df2.head()

Then I want to merge them as shown below:

df3 = df1[['Id', 'Date']]\
    .groupby('Id')\
    .agg('max')\
    .merge(df2, on=['Id', 'Date'], how='left')

But I get the error:

ValueError: You are trying to merge on datetime64[ns, UTC] and object columns. If you wish to proceed you should use pd.concat

Expected output:

                 Date  Id      Val2
  2015-02-24 00:00:00  96  2.965560
  2015-02-24 00:01:00  31  5.593345
  2015-02-24 00:02:00  58  11.78276
  2015-02-24 00:03:00  81 -0.444234
  2015-02-24 00:04:00  73 -0.322334

Please consider that this is a simplified reproduction of my real code. In my real code I only have merge operation that fails with the same message. The rest of the code is used to reproduce the needed data formats.

Upvotes: 3

Views: 5499

Answers (1)

Arn
Arn

Reputation: 2015

Try:

df3 = df1[['Id', 'Date']].groupby('Id').agg('max')

Then:

df3 = pd.merge(df3, df2)

Produces:

>>> df3
                 Date  Id      Val2
0 2015-02-24 00:03:00  12 -0.103219
1 2015-02-24 00:04:00  24  0.410599
2 2015-02-24 00:02:00  32 -0.151357
3 2015-02-24 00:01:00  41  0.950088
4 2015-02-24 00:00:00  46 -0.977278

Upvotes: 2

Related Questions