Reputation: 24685
Consider the following csv file where there is a duplicate name in "Name" column:
ID,Name,T,CA,I,C,IP
129,K1,1.2,64,386,5522,0.07
6,K1,1.1,3072,28800,6485,4.44
157,K2,1.1,512,1204,3257,0.37
I want to group the rows by name and record I and C columns like this
K1:
0 I 386 28800
1 C 5522 6485
K2:
0 I 1204
1 C 3257
I have written this code which groups the rows by name column and build a dictionary.
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
df = pd.read_csv('test.csv', usecols=['Name', 'I', 'C'])
for name, df_group in df.groupby('Name'):
my_dict[name] = pd.DataFrame(df_group)
print(my_dict)
But the output is
{'dummy': Value
C 0
I 1, 'K1': Name I C
0 K1 386 5522
1 K1 28800 6485, 'K2': Name I C
2 K2 1204 3257}
As you can see the I and C are written in columns, so the rows for each key are increased. That is the opposite of what I want. How can I fix that?
Upvotes: 1
Views: 323
Reputation: 862481
I think you need select columns with transpose. I dont use dict comprehension, because in your code are added new DataFrame to existing dict:
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
for name, df_group in df.groupby('Name'):
my_dict[name] = df_group[[ 'I', 'C']].T
print(my_dict['K1'])
0 1
I 386 28800
C 5522 6485
If new column is necessary:
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
for name, df_group in df.groupby('Name'):
my_dict[name] = df_group[[ 'I', 'C']].T.rename_axis('g').reset_index()
print(my_dict['K1'])
g 0 1
0 I 386 28800
1 C 5522 6485
Upvotes: 1