Reputation: 53
I've got two large df's from two xlsx spreadsheets and would like to merge them 'on' time stamp ['Time'] data.
The problem is that one data set has recorded time to decimal seconds and the other to whole seconds. This means that the two data set ['Time'] keys never match up...
df1 Time df2 Time
0 00:07:53.755000 0 00:07:53
1 00:07:54.096000 1 00:07:54
2 00:07:55.097000 2 00:07:55
3 00:07:56.099000 3 00:07:56
4 00:07:57.002000 4 00:07:57
5 00:07:58.012000 5 00:07:58
I've tried modifying time formats in Excel but it always retains the millisecond value because of its fractional 24 hour thing. I need to remove the decimal seconds from df1 (or just get them to match somehow!) to allow matching with row data from df2 and am hoping there is a much simpler way to do this in python?
Thanks to any and all advice!
Upvotes: 4
Views: 12523
Reputation: 164773
In df1
, you can just set microseconds to 0:
df1['Time'] = pd.to_datetime(df1['Time']).apply(lambda x: x.replace(microsecond=0))
Then perform your merge as normal.
Upvotes: 8
Reputation: 1744
I would use pandas strftime
df1['Time'] = pd.to_datetime(df1['Time']).dt.strftime('%H:%M:%S')
df2['Time'] = pd.to_datetime(df2['Time']).dt.strftime('%H:%M:%S')
merged = pd.merge(df1, df2, on='Time')
Upvotes: 3
Reputation: 21739
This is one way to do this.
# convert time to str and remove millisecond
df1['Time'] = df1['Time'].astype(str)
df1['Time'] = df1['Time'].str.replace('\..*','')
# Just to be sure there are no error because of timeformat
df2['Time'] = df2['Time'].astype(str)
# now we can join
df3 = pd.merge(df1, df2, on='Time')
Upvotes: 2