Reputation: 1300
I import a CSV file with content like this:
Id; PartNrInt; Some; other; stuff;
R1; 1234-5678; x1; y1; z1;
R2; 1234-6789; x2; y2; z2;
R3; 1234-5678; x3; y3; z3;
Then I have a database which has additional data for each PartNrInt
. I merge this two dataframes, so I have something like this:
Id; PartNrInt; OrderNr; Manufacturer; Some; other; stuff;
R1; 1234-5678; OrderNr1; Manuf1; x1; y1; z1;
R2; 1234-6789; OrderNr2: Manuf2; x2; y2; z2;
R3; 1234-5678; OrderNr1: Manuf1; x3; y3; z3;
This part works fine, I can easily print the dataframe. For importing the file in our ERP System I have to group the tables by PartNrInt
.
So I want a table like:
Count; Names; PartNrInt; OrderNr; Manufacturer
2; R1, R3; 1234-5678; OrderNr1; Manuf1
1; R2; 1234-6789; OrderNr1; Manuf1
My problem is, I can groupby the data with df.groupby('PartNrInt')['Id'].apply(list)
and count the objects, but I'm unable to get the new data in a new frame for exporting.
I'm completely new to pandas and to python so maybe there's a really simple solution.
Upvotes: 3
Views: 1143
Reputation: 294488
g = df.groupby('PartNrInt')
g[['Manufacturer', 'OrderNr']].first() \
.join(g.Id.agg({'Names': ', '.join, 'Count': 'count'})) \
.reset_index()
PartNrInt Manufacturer OrderNr Names Count
0 1234-5678 Manuf1 OrderNr1 R1, R3 2
1 1234-6789 Manuf2 OrderNr2 R2 1
Upvotes: 3
Reputation: 863166
You can use groupby
with aggregate
join
and len
, last reset_index
and reorder columns:
df = df.groupby(['PartNrInt', 'OrderNr','Manufacturer']).Id
.agg({"Names": ','.join, 'Count': len})
.reset_index()[['Count','Names','PartNrInt','OrderNr','Manufacturer']]
print (df)
Count Names PartNrInt OrderNr Manufacturer
0 2 R1,R3 1234-5678 OrderNr1 Manuf1
1 1 R2 1234-6789 OrderNr2 Manuf2
Upvotes: 4