Reputation: 55
I am trying to combine two different timeframes of pandas dataframe. The first dataframe has 1 hour timeseries. and the second dataframe has 1 minute timeseries.
1 hour dataframe
get_time value
0 1599739200 123.10
1 1599742800 136.24
2 1599750000 224.14
1 minute dataframe
get_time value
0 1599739200 2.11
1 1599739260 3.11
2 1599739320 3.12
3 1599742800 4.23
4 1599742860 2.22
5 1599742920 1.11
6 1599746400 7.24
7 1599746460 22.10
8 1599746520 2.13
9 1599750000 5.14
10 1599750060 12.10
11 1599750120 21.30
I want to combine those two dataframes, so the value of 1 hour dataframe will be mapped in 1 minute dataframe. if there is no 1 hour value then the mapped value will be nan.
Desired Result:
get_time value 1h mapped value
0 1599739200 2.11 123.10
1 1599739260 3.11 123.10
2 1599739320 3.12 123.10
3 1599742800 4.23 136.24
4 1599742860 2.22 136.24
5 1599742920 1.11 136.24
6 1599746400 7.24 NaN
7 1599746460 22.10 NaN
8 1599746520 2.13 NaN
9 1599750000 5.14 224.14
10 1599750060 12.10 224.14
11 1599750120 21.30 224.14
Basically i want to combine those dataframe with these logic:
if (1m_get_time >= 1h_get_time) and (1m_get_time < 1h_get_time+60minutes)
1h mapped value = 1h value
else:
1h mapped value = nan
Currently i use recursive method. But it takes long time for big size of data. here is the example of dataframe:
dfhigh_ = pd.DataFrame({
'get_time' : [1599739200, 1599742800, 1599750000],
'value' : [123.1, 136.24, 224.14],
})
dflow_ = pd.DataFrame({
'get_time' : [1599739200, 1599739260, 1599739320, 1599742800, 1599742860, 1599742920, 1599746400, 1599746460, 1599746520, 1599750000, 1599750060, 1599750120],
'value' : [2.11, 3.11, 3.12, 4.23, 2.22, 1.11, 7.24, 22.1, 2.13, 5.14, 12.1, 21.3],
})
Upvotes: 1
Views: 922
Reputation: 158
This should work (for edge cases as well):
import pandas as pd
from datetime import datetime
import numpy as np
dfhigh_ = dfhigh_.rename(columns={'value': '1h mapped value'})
df_new = pd.merge(dflow_, dfhigh_, how='outer', on=['get_time'])
df_new.get_time = [datetime.fromtimestamp(x) for x in df_new['get_time']]
for idx,row in df_new.iterrows():
if not np.isnan(row['1h mapped value']):
current_hour, current_1h_mapped_value = row['get_time'].hour, row['1h mapped value']
for sub_idx,sub_row in df_new.loc[(df_new.get_time.dt.hour == current_hour) & np.isnan(df_new['1h mapped value'])].iterrows():
df_new.loc[sub_idx, '1h mapped value'] = current_1h_mapped_value
Upvotes: 1
Reputation: 71689
Floor the get_time
from dflow_
to nearest hour representation then use Series.map
to map the values from dfhigh_
to dflow_
based on this rounded timestamp:
hr = dflow_['get_time'] // 3600 * 3600
dflow_['mapped_value'] = hr.map(dfhigh_.set_index('get_time')['value'])
get_time value mapped_value
0 1599739200 2.11 123.10
1 1599739260 3.11 123.10
2 1599739320 3.12 123.10
3 1599742800 4.23 136.24
4 1599742860 2.22 136.24
5 1599742920 1.11 136.24
6 1599746400 7.24 NaN
7 1599746460 22.10 NaN
8 1599746520 2.13 NaN
9 1599750000 5.14 224.14
10 1599750060 12.10 224.14
11 1599750120 21.30 224.14
Upvotes: 2