mahmood
mahmood

Reputation: 24685

Using groupby and append values at columns

Consider the following csv file where there is a duplicate name in "Name" column:

ID,Name,T,CA,I,C,IP
129,K1,1.2,64,386,5522,0.07
6,K1,1.1,3072,28800,6485,4.44
157,K2,1.1,512,1204,3257,0.37

I want to group the rows by name and record I and C columns like this

K1:
     0   I   386  28800
     1   C   5522 6485
K2:
     0   I   1204
     1   C   3257

I have written this code which groups the rows by name column and build a dictionary.

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
df = pd.read_csv('test.csv', usecols=['Name', 'I', 'C'])
for name, df_group in df.groupby('Name'):
    my_dict[name] = pd.DataFrame(df_group)
print(my_dict)

But the output is

{'dummy':    Value
C      0
I      1, 'K1':   Name      I     C
0   K1    386  5522
1   K1  28800  6485, 'K2':   Name     I     C
2   K2  1204  3257}

As you can see the I and C are written in columns, so the rows for each key are increased. That is the opposite of what I want. How can I fix that?

Upvotes: 1

Views: 323

Answers (1)

jezrael
jezrael

Reputation: 862481

I think you need select columns with transpose. I dont use dict comprehension, because in your code are added new DataFrame to existing dict:

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}

for name, df_group in df.groupby('Name'):
    my_dict[name] = df_group[[ 'I', 'C']].T
print(my_dict['K1'])
      0      1
I   386  28800
C  5522   6485

If new column is necessary:

data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}

for name, df_group in df.groupby('Name'):
    my_dict[name] = df_group[[ 'I', 'C']].T.rename_axis('g').reset_index()
print(my_dict['K1'])
   g     0      1
0  I   386  28800
1  C  5522   6485

Upvotes: 1

Related Questions