Using groupby() with appending additional rows

Question

With the following csv input file

ID,Name,Metric,Value
0,K1,M1,200
0,K1,M2,5
1,K2,M1,1
1,K2,M2,10
2,K2,M1,500
2,K2,M2,8

This code, groups the rows by the name column, e.g. two groups. Then it appends the values as columns for the same Name.

df = pd.read_csv('test.csv', usecols=['ID','Name','Metric','Value'])
print(df)
my_array = []
for name, df_group in df.groupby('Name'):
    my_array.append( pd.concat(
        [g.reset_index(drop=True) for _, g in df_group.groupby('ID')['Value']],
        axis=1) )
print(my_array)

The output looks like

   ID Name Metric  Value
0   0   K1     M1    200
1   0   K1     M2      5
2   1   K2     M1      1
3   1   K2     M2     10
4   2   K2     M1    500
5   2   K2     M2      8
[   Value
0    200
1      5,    Value  Value
0      1    500
1     10      8]

For example, my_array[1] which is K2 has two rows corresponding to M1 and M2. I would like to keep the IDs as well in the final data frames in my_array. So I want to add a third row and save it (M1, M2 and ID). Therefore, the final my_array should be

[   Value
0    200
1      5
2      0,    Value  Value
0      1    500              <-- For K2, there are two M1 (1 and 500)
1     10      8              <-- For K2, there are two M2 (10 and 8)
2      1      2]             <-- For K2, there are two ID (1 and 2)

How can I modify the code for that purpose?

jezrael · Accepted Answer

You can use DataFrame.pivot for DataFrames pe groups and then append df1.columns in np.vstack:

my_array = []
for name, df_group in df.groupby('Name'):
    df1 = df_group.pivot('Metric','ID','Value')
    my_array.append(pd.DataFrame(np.vstack([df1, df1.columns])))
print (my_array)
[     0
0  200
1    5
2    0,     0    1
0   1  500
1  10    8
2   1    2]

Using groupby() with appending additional rows

Answers (1)

Related Questions