Append rows to dataframe, add new columns if not exist

Question

I have a df like below which

>>df
  group  sub_group    max
0     A          1   30.0
1     B          1  300.0
2     B          2    3.0
3     A          2    2.0

I need to have group and sub_group as atrributes (columns) and max as row So I do

>>> newdf.set_index(['group','sub_group']).T
group         A      B         A
sub_group     1      1    2    2
max        30.0  300.0  3.0  2.0

This gives me my intended formatting

Now I need to merge it to another similar dataframe say

>>df2
  group  sub_group    max
0     C          1   3000.0
1     A          1   4000.0

Such that my merge results in

group         A      B         A     C
sub_group     1      1    2    2     1
max        30.0   300.0  3.0  2.0    NaN
max      4000.0    NaN   NaN  NaN    3000.0

Basically at every new df we are placing values under appropriate heading, if there is a new group or subgroup we add it the larger df. I am not sure if my way of transposing and then trying to merge append is a good approach

Since these df are generated in loop (loop items being dates), I would like to get a way to replace max printed in 1st column(of expected op) by loop date.

dates=['20170525', '20170623', '20170726']

for date in dates:
     df = pd.read_csv()

jezrael · Accepted Answer

I think you can add parameter index_col to read_csv first for Multiindex from first and second column:

dfs = []
for date in dates:
     df = pd.read_csv('name', index_col=[0,1]) 
     dfs.append(df)

#another test df was added
print (df3)
                    max
group sub_group        
D     1          3000.0
E     1          4000.0

Then concat them together with parameter keys by list, then reshape by unstack and transpose:

#dfs = [df,df2,df3]
dates=['20170525', '20170623', '20170726']
df = pd.concat(dfs, keys=dates)['max'].unstack(0).T
print (df)
group           A           B            C       D       E
sub_group       1    2      1    2       1       1       1
20170525     30.0  2.0  300.0  3.0     NaN     NaN     NaN
20170623   4000.0  NaN    NaN  NaN  3000.0     NaN     NaN
20170726      NaN  NaN    NaN  NaN     NaN  3000.0  4000.0

Append rows to dataframe, add new columns if not exist

Answers (1)

Related Questions