Reputation: 24715
With the following csv input file
ID,Name,Metric,Value
0,K1,M1,200
0,K1,M2,5
1,K2,M1,1
1,K2,M2,10
2,K2,M1,500
2,K2,M2,8
This code, groups the rows by the name column, e.g. two groups. Then it appends the values as columns for the same Name.
df = pd.read_csv('test.csv', usecols=['ID','Name','Metric','Value'])
print(df)
my_array = []
for name, df_group in df.groupby('Name'):
my_array.append( pd.concat(
[g.reset_index(drop=True) for _, g in df_group.groupby('ID')['Value']],
axis=1) )
print(my_array)
The output looks like
ID Name Metric Value
0 0 K1 M1 200
1 0 K1 M2 5
2 1 K2 M1 1
3 1 K2 M2 10
4 2 K2 M1 500
5 2 K2 M2 8
[ Value
0 200
1 5, Value Value
0 1 500
1 10 8]
For example, my_array[1]
which is K2 has two rows corresponding to M1 and M2. I would like to keep the IDs as well in the final data frames in my_array
. So I want to add a third row and save it (M1, M2 and ID). Therefore, the final my_array
should be
[ Value
0 200
1 5
2 0, Value Value
0 1 500 <-- For K2, there are two M1 (1 and 500)
1 10 8 <-- For K2, there are two M2 (10 and 8)
2 1 2] <-- For K2, there are two ID (1 and 2)
How can I modify the code for that purpose?
Upvotes: 1
Views: 43
Reputation: 862701
You can use DataFrame.pivot
for DataFrames pe groups and then append df1.columns
in np.vstack
:
my_array = []
for name, df_group in df.groupby('Name'):
df1 = df_group.pivot('Metric','ID','Value')
my_array.append(pd.DataFrame(np.vstack([df1, df1.columns])))
print (my_array)
[ 0
0 200
1 5
2 0, 0 1
0 1 500
1 10 8
2 1 2]
Upvotes: 3