Sort and group csv with pandas

Question

I import a CSV file with content like this:

Id; PartNrInt; Some; other; stuff;
R1; 1234-5678; x1; y1; z1;
R2; 1234-6789; x2; y2; z2;
R3; 1234-5678; x3; y3; z3;

Then I have a database which has additional data for each PartNrInt. I merge this two dataframes, so I have something like this:

Id; PartNrInt; OrderNr; Manufacturer; Some; other; stuff;
R1; 1234-5678; OrderNr1; Manuf1; x1; y1; z1;
R2; 1234-6789; OrderNr2: Manuf2; x2; y2; z2;
R3; 1234-5678; OrderNr1: Manuf1; x3; y3; z3;

This part works fine, I can easily print the dataframe. For importing the file in our ERP System I have to group the tables by PartNrInt. So I want a table like:

Count; Names; PartNrInt; OrderNr; Manufacturer
2; R1, R3; 1234-5678; OrderNr1; Manuf1
1; R2; 1234-6789; OrderNr1; Manuf1

My problem is, I can groupby the data with df.groupby('PartNrInt')['Id'].apply(list) and count the objects, but I'm unable to get the new data in a new frame for exporting. I'm completely new to pandas and to python so maybe there's a really simple solution.

jezrael · Accepted Answer

You can use groupby with aggregate join and len, last reset_index and reorder columns:

df = df.groupby(['PartNrInt', 'OrderNr','Manufacturer']).Id 
       .agg({"Names": ','.join, 'Count': len}) 
       .reset_index()[['Count','Names','PartNrInt','OrderNr','Manufacturer']]
print (df)

   Count  Names  PartNrInt   OrderNr Manufacturer
0      2  R1,R3  1234-5678  OrderNr1       Manuf1
1      1     R2  1234-6789  OrderNr2       Manuf2

Sort and group csv with pandas

Answers (2)

Related Questions