CodeForJoy
CodeForJoy

Reputation: 1

How to merge 3 Pandas Dataframes based on Timestamp?

I have three dataframes in Pandas, say df1, df2 and df3. The first column of all dataframes is the Timestamp (DateTime format like 2017-01-01 12:30:00 etc.) Here is an example of each's first column:-

df1 TimeStamp
    2016-01-01 12:00:00 
    2016-01-01 12:10:00 
      .....
df2 TimeStamp
    2016-01-01 12:00:00 
    2016-01-01 12:10:00 
      .....
df3 TimeStamp
    2016-13-01 12:00:00 
    2016-13-01 12:30:00 
      .....

As you can see, for the first two are at 10 minutes intervals, while the third one is at 30 minutes intervals. What I would like to do is to merge all 3 dataframes together, such that for cases where there is not exact match due to non-available data(like 12:10:00 not available for 3rd dataframe ), it would be considered as 12:00:00 (the preceding measurement) for merging purposes. (But of course, the Date should be the same) Note that all the dataframes have different sizes, but I would like to merge them based on Timestamp together for analytical purposes. Thank you!

DESIRED RESULT:
df_final TimeStamp      ..  Columns of df1 Columns of df2 Columns of df3
    2016-13-01 12:00:00
    2016-13-01 12:10:00 
    2016-13-01 12:20:00 

      .....

MORE DETAILS BASED ON ANSWER SUGGESTED Firstly, as my dataframes (all 3) did not have index as TimeStamps, but had columns as TimeStamps, I set index for each as the TimeStamps:

df1.index = df1.TimeStamp
df2.index = df2.TimeStamp
df3.index = df3.TimeStamp

On using this

u_index = df3.index.union(df2.index.union(df1.index))

I get a weird output strangely which is not at regularly 10 min intervals like needed.

Index(['2016-01-01 00:00:00.000', '2016-01-01 00:00:00.000',
       '2016-01-01 00:00:00.000', '2016-01-01 00:00:00.000',
    ...
       '2017-12-31 23:50:00.000', '2017-12-31 23:50:00.000',
       '2017-12-31 23:50:00.000', '2017-12-31 23:50:00.000',

      dtype='object', name='TimeStamp', length=3199372)

Accordingly, the final df1_n dataframe is at 30 min intervals and not 10 mins (as the Union of indices was not properly done). I think that there is something going wrong here and once Step 2 suggested (u_index) is working properly, everything will be easy to merge the dataframes.

Upvotes: 0

Views: 819

Answers (1)

HadarM
HadarM

Reputation: 113

So I'm not 100% sure if what you asked for is how to complete the missing values after merging the three dataframes with the next valid observation. if so, that's the quickest way I found to do this (not the most elegant...):

  1. create a new index which is the union of the three indexes (will result in timestamp with intervals of 10 minutes in you case).
  2. reindex all three dfs according to the new index while filling in missing values separately.
  3. merge columns of the three dfs (which will be easy since after step NO.2 they will have the same index).

taking a portion of the data:

    df1
Out[48]: 
                     val_1
TimeStamp                 
2016-01-01 12:00:00     11
2016-01-01 12:10:00     12
df2
Out[49]: 
                     val_2
TimeStamp                 
2016-01-01 12:00:00     21
2016-01-01 12:10:00     22
df3
Out[50]: 
                     val_3
TimeStamp                 
2016-01-01 12:00:00     31
2016-13-01 12:30:00     32

step NO.1

u_index = df3.index.union(df2.index.union(df1.index))
u_index
Out[38]: Index(['2016-01-01 12:00:00', '2016-01-01 12:10:00', '2016-13-01 12:30:00'], dtype='object', name='TimeStamp')

step NO.2

df3_n = df3.reindex(index=u_index,method='bfill')
df2_n = df2.reindex(index=u_index,method='bfill')
df1_n = df1.reindex(index=u_index,method='bfill')

step NO.3

df1_n.merge(df2_n,on='TimeStamp').merge(df3_n,on='TimeStamp')
Out[47]: 
                     val_1  val_2  val_3
TimeStamp                               
2016-01-01 12:00:00   11.0   21.0     31
2016-01-01 12:10:00   12.0   22.0     32
2016-13-01 12:30:00    NaN    NaN     32

You might need to adjust the last row, since it has no following row to fill values from. but that's it pretty much.

Upvotes: 1

Related Questions