Reputation: 2146
I have a df like below which
>>df
group sub_group max
0 A 1 30.0
1 B 1 300.0
2 B 2 3.0
3 A 2 2.0
I need to have group and sub_group as atrributes (columns) and max as row So I do
>>> newdf.set_index(['group','sub_group']).T
group A B A
sub_group 1 1 2 2
max 30.0 300.0 3.0 2.0
This gives me my intended formatting
Now I need to merge it to another similar dataframe say
>>df2
group sub_group max
0 C 1 3000.0
1 A 1 4000.0
Such that my merge results in
group A B A C
sub_group 1 1 2 2 1
max 30.0 300.0 3.0 2.0 NaN
max 4000.0 NaN NaN NaN 3000.0
Basically at every new df we are placing values under appropriate heading, if there is a new group or subgroup we add it the larger df. I am not sure if my way of transposing and then trying to merge append is a good approach
Since these df are generated in loop (loop items being dates), I would like to get a way to replace max
printed in 1st column(of expected op) by loop date.
dates=['20170525', '20170623', '20170726']
for date in dates:
df = pd.read_csv()
Upvotes: 2
Views: 4758
Reputation: 862851
I think you can add parameter index_col
to read_csv
first for Multiindex
from first and second column:
dfs = []
for date in dates:
df = pd.read_csv('name', index_col=[0,1])
dfs.append(df)
#another test df was added
print (df3)
max
group sub_group
D 1 3000.0
E 1 4000.0
Then concat
them together with parameter keys
by list
, then reshape by unstack
and transpose:
#dfs = [df,df2,df3]
dates=['20170525', '20170623', '20170726']
df = pd.concat(dfs, keys=dates)['max'].unstack(0).T
print (df)
group A B C D E
sub_group 1 2 1 2 1 1 1
20170525 30.0 2.0 300.0 3.0 NaN NaN NaN
20170623 4000.0 NaN NaN NaN 3000.0 NaN NaN
20170726 NaN NaN NaN NaN NaN 3000.0 4000.0
Upvotes: 1