Reputation: 1
I have three dataframes in Pandas, say df1, df2 and df3
. The first column of all dataframes is the Timestamp (DateTime format like 2017-01-01 12:30:00
etc.) Here is an example of each's first column:-
df1 TimeStamp
2016-01-01 12:00:00
2016-01-01 12:10:00
.....
df2 TimeStamp
2016-01-01 12:00:00
2016-01-01 12:10:00
.....
df3 TimeStamp
2016-13-01 12:00:00
2016-13-01 12:30:00
.....
As you can see, for the first two are at 10 minutes intervals, while the third one is at 30 minutes intervals. What I would like to do is to merge all 3 dataframes together, such that for cases where there is not exact match due to non-available data(like 12:10:00
not available for 3rd dataframe ), it would be considered as 12:00:00
(the preceding measurement) for merging purposes. (But of course, the Date should be the same) Note that all the dataframes have different sizes, but I would like to merge them based on Timestamp together for analytical purposes. Thank you!
DESIRED RESULT:
df_final TimeStamp .. Columns of df1 Columns of df2 Columns of df3
2016-13-01 12:00:00
2016-13-01 12:10:00
2016-13-01 12:20:00
.....
MORE DETAILS BASED ON ANSWER SUGGESTED Firstly, as my dataframes (all 3) did not have index as TimeStamps, but had columns as TimeStamps, I set index for each as the TimeStamps:
df1.index = df1.TimeStamp
df2.index = df2.TimeStamp
df3.index = df3.TimeStamp
On using this
u_index = df3.index.union(df2.index.union(df1.index))
I get a weird output strangely which is not at regularly 10 min intervals like needed.
Index(['2016-01-01 00:00:00.000', '2016-01-01 00:00:00.000',
'2016-01-01 00:00:00.000', '2016-01-01 00:00:00.000',
...
'2017-12-31 23:50:00.000', '2017-12-31 23:50:00.000',
'2017-12-31 23:50:00.000', '2017-12-31 23:50:00.000',
dtype='object', name='TimeStamp', length=3199372)
Accordingly, the final df1_n dataframe is at 30 min
intervals and not 10 mins
(as the Union of indices was not properly done). I think that there is something going wrong here and once Step 2 suggested (u_index
) is working properly, everything will be easy to merge the dataframes.
Upvotes: 0
Views: 819
Reputation: 113
So I'm not 100% sure if what you asked for is how to complete the missing values after merging the three dataframes with the next valid observation. if so, that's the quickest way I found to do this (not the most elegant...):
taking a portion of the data:
df1
Out[48]:
val_1
TimeStamp
2016-01-01 12:00:00 11
2016-01-01 12:10:00 12
df2
Out[49]:
val_2
TimeStamp
2016-01-01 12:00:00 21
2016-01-01 12:10:00 22
df3
Out[50]:
val_3
TimeStamp
2016-01-01 12:00:00 31
2016-13-01 12:30:00 32
step NO.1
u_index = df3.index.union(df2.index.union(df1.index))
u_index
Out[38]: Index(['2016-01-01 12:00:00', '2016-01-01 12:10:00', '2016-13-01 12:30:00'], dtype='object', name='TimeStamp')
step NO.2
df3_n = df3.reindex(index=u_index,method='bfill')
df2_n = df2.reindex(index=u_index,method='bfill')
df1_n = df1.reindex(index=u_index,method='bfill')
step NO.3
df1_n.merge(df2_n,on='TimeStamp').merge(df3_n,on='TimeStamp')
Out[47]:
val_1 val_2 val_3
TimeStamp
2016-01-01 12:00:00 11.0 21.0 31
2016-01-01 12:10:00 12.0 22.0 32
2016-13-01 12:30:00 NaN NaN 32
You might need to adjust the last row, since it has no following row to fill values from. but that's it pretty much.
Upvotes: 1