Reputation: 303
Trying to merge to dataframes. First is big, second is small. They have a datetime set as index. I want the datetime value of the second one (and the row) merged in between the datetime values of the first one, sorted by time.
df1:
df1 = pd.read_csv(left_inputfile_to_read, decimal=".",sep=';', parse_dates = True, low_memory=False)
df1.columns = ['FLIGHT_ID','X', 'Y','MODE_C', 'SPEED', 'HEADING', 'TRK_ROCD', 'TIJD']
df1['datetime'] = pd.to_datetime(df1['TIJD'], infer_datetime_format = True, format="%Y-%M-%D %H:%M:%S")
df1.set_index(['datetime'], inplace=True)
print(df1)
FLIGHT_ID X Y MODE_C SPEED HEADING TRK_ROCD TIJD
datetime
2019-01-28 00:26:56 20034026 -13345 -1923 230.0 414 88 NaN 28-1-2019 00:26:56
2019-01-28 00:27:00 20034026 -13275 -1923 230.0 414 88 NaN 28-1-2019 00:27:00
2019-01-28 00:27:05 20034026 -13204 -1923 230.0 414 88 NaN 28-1-2019 00:27:05
2019-01-28 00:27:10 20034026 -13134 -1923 230.0 414 88 NaN 28-1-2019 00:27:10
2019-01-28 00:27:15 20034026 -13064 -1923 230.0 414 88 NaN 28-1-2019 00:27:15
... ... ... ... ... ... ... ... ...
2019-01-29 00:08:32 20035925 13443 -531 230.0 257 85 NaN 29-1-2019 00:08:32
2019-01-29 00:08:37 20035925 13487 -526 230.0 257 85 NaN 29-1-2019 00:08:37
2019-01-29 00:08:42 20035925 13530 -520 230.0 257 85 NaN 29-1-2019 00:08:42
2019-01-29 00:08:46 20035925 13574 -516 230.0 257 85 NaN 29-1-2019 00:08:46
2019-01-29 00:08:51 20035925 13617 -510 230.0 257 85 NaN 29-1-2019 00:08:51
551446 rows × 8 columns
df2:
df2 = pd.read_csv(right_inputfile_to_read, decimal=".",sep=';', parse_dates = True, low_memory=False)
df2['datetime'] = pd.to_datetime(df2['T_START'], infer_datetime_format = True, format="%Y-%M-%D %H:%M:%S" , dayfirst=True)
df2.set_index(['datetime'], inplace=True)
df2.drop(columns=['T_START', 'T_END', 'AIRFIELD'], inplace=True)
print(df2)
QNH MODE_C_CORRECTION
datetime
2019-01-28 02:14:00 1022 235
2019-01-28 02:14:00 1022 235
2019-01-28 02:16:00 1019 155
2019-01-28 02:21:00 1019 155
2019-01-28 02:36:00 1019 155
... ... ...
2019-01-28 21:56:00 1014 21
2019-01-28 22:56:00 1014 21
2019-01-28 23:26:00 1014 21
2019-01-28 23:29:00 1014 21
2019-01-28 23:52:00 1014 21
[69 rows x 2 columns]
The idea is that the first row of df2 should be inserted somewhere at 2019-01-28 02:14:00. I have spent hours on Stackoverflow and pandas documentation (merge, join, concat) but cannot find the right solution.
The next step would be to interpolate the values in column 'QNH' to the rows that are in df1, based on that time.
Any help greatly appreciated!
Upvotes: 0
Views: 2191
Reputation: 392
Just concatenate two DataFrames and sort by date:
df = pd.concat([df1,df2]).sort_values(by='datetime')
For the next step you can use pandas.DataFrame.interpolate.
Upvotes: 2