Reputation: 1
I want to resample one min stock data into 1 hour.
Here is my code(my orginal coded is in chinese):
import pandas as pd
import glob
path = 'C:/Users/Desktop/fut_data_1min_2023'
all_files = glob.glob(path + "/*.csv")
all_files
here is my result:
['C:/Users/Desktop/fut_data_1min_2023\\a2301.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2303.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2305.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2307.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2309.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2311.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2401.csv',
'C:/Users/Desktop/fut_data_1min_2023\\a2403.csv',
'C:/Users/Desktop/fut_data_1min_2023\\b2312.csv',
...
'C:/Users/Desktop/fut_data_1min_2023\\zn2403.csv',
'C:/Users/Desktop/fut_data_1min_2023\\zn2404.csv',
'C:/Users/Desktop/fut_data_1min_2023\\zn2405.csv',
'C:/Users/Desktop/fut_data_1min_2023\\zn2406.csv',
'C:/Users/Desktop/fut_data_1min_2023\\zn2407.csv']
I created a dateframe to included all csv files.
li = [pd.read_csv(filename, index_col=None, header=0, encoding='gbk') for filename in all_files]
df = pd.concat(li, axis=0, ignore_index=True)
df['时间'] = pd.to_datetime(df['时间']) #date
df.set_index('时间', inplace=True)
aggregation_rules = {
'市场代码': 'first', #Market Code
'合约代码': 'first', #Contract Code
'开': 'first', #open
'高': 'max',#high
'低': 'min',#low
'收': 'last',#close
'成交量': 'sum',#volume
'成交额': 'sum',#amount
'持仓量': 'sum'#Open Interest
}
#1 hour resample
df_resampled_1hr = df.resample('H').agg(aggregation_rules)
print(df_resampled_1hr)
**here is my result:**
市场代码 合约代码 开 高 低 收 \
时间
2023-01-03 09:00:00 DC a2301 5258.0 235100.0 409.08 23375.0
2023-01-03 10:00:00 DC a2301 5240.0 233490.0 409.52 22880.0
2023-01-03 11:00:00 DC a2301 5258.0 232800.0 409.68 22900.0
2023-01-03 12:00:00 None None NaN NaN NaN NaN
2023-01-03 13:00:00 DC a2301 5258.0 232730.0 409.94 22950.0
... ... ... ... ... ... ...
2023-07-31 11:00:00 DC a2309 4931.0 233900.0 455.04 20340.0
2023-07-31 12:00:00 None None NaN NaN NaN NaN
2023-07-31 13:00:00 DC a2309 4928.0 234120.0 454.78 20405.0
2023-07-31 14:00:00 DC a2309 4951.0 234010.0 454.90 20160.0
2023-07-31 15:00:00 DC a2309 4967.0 233540.0 455.00 20160.0
成交量 成交额 持仓量
时间
2023-01-03 09:00:00 9624618.0 4.146605e+11 1848710138
2023-01-03 10:00:00 4184041.0 1.710870e+11 1413715236
2023-01-03 11:00:00 1750278.0 8.170628e+10 976859153
2023-01-03 12:00:00 0.0 0.000000e+00 0
2023-01-03 13:00:00 2119662.0 8.518207e+10 916939846
... ... ... ...
2023-07-31 11:00:00 2645954.0 9.241121e+10 1340286918
2023-07-31 12:00:00 0.0 0.000000e+00 0
2023-07-31 13:00:00 2960147.0 1.042305e+11 1253078394
2023-07-31 14:00:00 4385137.0 1.701434e+11 2590938012
2023-07-31 15:00:00 170406.0 6.481716e+09 42982844
[5023 rows x 9 columns]
1 hour resample did not included all csv files, it stoped at a2309. I tried different aggregation_rule, did not work. So, I believe there is something wrong with my resample. But I cannot figure it out, please help!
Upvotes: 0
Views: 20