Reputation: 4253
The situation is that I have a few files with time_series data for various stocks with several fields. each file contains
time, open, high, low, close, volume
the goal is to get that all into one dataframe of the form
field open high ...
security hk_1 hk_2 hk_3 ... hk_1 hk_2 hk_3 ... ...
time
t_1 open_1_1 open_2_1 open_3_1 ... high_1_1 high_2_1 high_3_1 ... ...
t_2 open_1_2 open_2_2 open_3_2 ... high_1_2 high_2_2 high_3_2 ... ...
... ... ... ... ... ... ... ... ... ...
I created a multiindex
fields = ['time','open','high','low','close','volume','numEvents','value']
midx = pd.MultiIndex.from_product([security_name'], fields], names=['security', 'field'])
and for a start, tried to apply that multiindex to the dataframe I get from reading the data from csv (by creating a new dataframe and adding the index)
for c in eqty_names_list:
midx = pd.MultiIndex.from_product([[c], fields], names=['security', 'field'])
df_temp = pd.read_csv('{}{}.csv'.format(path, c))
df_temp = pd.DataFrame(df_temp, columns=midx, index=df_temp['time'])
df_temp.df_name = c
all_dfs.append(df_temp)
However, the new dataframe only contains nan
security 1_HK
field time open high low close volume
time
NaN NaN NaN NaN NaN NaN NaN
Also, it still contains a column for time, although I tried to make that the index (so that I can later join all the other dataframes for other stocks by index to get the aggregated dataframe).
How can I apply the multiindex to the dataframe without losing my data and then later join the dataframes looking like this
security 1_HK
field time open high low close volume
time
to create something like (note hierarchy field and security are switched)
field time open high ...
security 1_HK 2_HK ... 1_HK 2_HK ... ...
time
Upvotes: 2
Views: 208
Reputation: 863156
I think you can first get all files to list files
, then with list comprehension get all DataFrames and concat
them by columns (axis=1)
. If add parameter keys
, you get Multiindex
in columns:
Files:
import pandas as pd
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]
eqty_names_list = ['hk1','hk2','hk3']
df = pd.concat(dfs, keys=eqty_names_list, axis=1)
print (df)
hk1 hk2 hk3
a b c a b c a b c
0 0 1 2 0 9 6 0 7 1
1 1 5 8 1 6 4 1 3 2
Last need swaplevel
and sort_index
:
df.columns = df.columns.swaplevel(0,1)
df = df.sort_index(axis=1)
print (df)
a b c
hk1 hk2 hk3 hk1 hk2 hk3 hk1 hk2 hk3
0 0 0 0 1 9 7 2 6 1
1 1 1 1 5 6 3 8 4 2
Upvotes: 2