Job Brüggen
Job Brüggen

Reputation: 303

How to merge two dataframes based on time?

Trying to merge to dataframes. First is big, second is small. They have a datetime set as index. I want the datetime value of the second one (and the row) merged in between the datetime values of the first one, sorted by time.

df1:

df1 = pd.read_csv(left_inputfile_to_read, decimal=".",sep=';', parse_dates = True, low_memory=False)
df1.columns = ['FLIGHT_ID','X', 'Y','MODE_C', 'SPEED', 'HEADING', 'TRK_ROCD', 'TIJD']
df1['datetime'] = pd.to_datetime(df1['TIJD'], infer_datetime_format = True, format="%Y-%M-%D %H:%M:%S")
df1.set_index(['datetime'], inplace=True)
print(df1)

FLIGHT_ID   X   Y   MODE_C  SPEED   HEADING TRK_ROCD    TIJD
datetime                                
2019-01-28 00:26:56 20034026    -13345  -1923   230.0   414 88  NaN 28-1-2019 00:26:56
2019-01-28 00:27:00 20034026    -13275  -1923   230.0   414 88  NaN 28-1-2019 00:27:00
2019-01-28 00:27:05 20034026    -13204  -1923   230.0   414 88  NaN 28-1-2019 00:27:05
2019-01-28 00:27:10 20034026    -13134  -1923   230.0   414 88  NaN 28-1-2019 00:27:10
2019-01-28 00:27:15 20034026    -13064  -1923   230.0   414 88  NaN 28-1-2019 00:27:15
... ... ... ... ... ... ... ... ...
2019-01-29 00:08:32 20035925    13443   -531    230.0   257 85  NaN 29-1-2019 00:08:32
2019-01-29 00:08:37 20035925    13487   -526    230.0   257 85  NaN 29-1-2019 00:08:37
2019-01-29 00:08:42 20035925    13530   -520    230.0   257 85  NaN 29-1-2019 00:08:42
2019-01-29 00:08:46 20035925    13574   -516    230.0   257 85  NaN 29-1-2019 00:08:46
2019-01-29 00:08:51 20035925    13617   -510    230.0   257 85  NaN 29-1-2019 00:08:51
551446 rows × 8 columns

df2:

df2 = pd.read_csv(right_inputfile_to_read, decimal=".",sep=';', parse_dates = True, low_memory=False)
df2['datetime'] = pd.to_datetime(df2['T_START'], infer_datetime_format = True, format="%Y-%M-%D %H:%M:%S" , dayfirst=True)
df2.set_index(['datetime'], inplace=True)
df2.drop(columns=['T_START', 'T_END', 'AIRFIELD'], inplace=True)
print(df2)

                      QNH  MODE_C_CORRECTION
datetime                                    
2019-01-28 02:14:00  1022                235
2019-01-28 02:14:00  1022                235
2019-01-28 02:16:00  1019                155
2019-01-28 02:21:00  1019                155
2019-01-28 02:36:00  1019                155
...                   ...                ...
2019-01-28 21:56:00  1014                 21
2019-01-28 22:56:00  1014                 21
2019-01-28 23:26:00  1014                 21
2019-01-28 23:29:00  1014                 21
2019-01-28 23:52:00  1014                 21

[69 rows x 2 columns]

The idea is that the first row of df2 should be inserted somewhere at 2019-01-28 02:14:00. I have spent hours on Stackoverflow and pandas documentation (merge, join, concat) but cannot find the right solution.

The next step would be to interpolate the values in column 'QNH' to the rows that are in df1, based on that time.

Any help greatly appreciated!

Upvotes: 0

Views: 2191

Answers (1)

voldr
voldr

Reputation: 392

Just concatenate two DataFrames and sort by date:

df = pd.concat([df1,df2]).sort_values(by='datetime')

For the next step you can use pandas.DataFrame.interpolate.

Upvotes: 2

Related Questions