Reputation: 21
Part of a csv-file ('data.csv') I have to process, looks like this:
parent_id,parent_name,Type,Companyname,Custsupid,Streetaddress
3,Customer,,,C0010,
3,Customer,A,,,
3,Customer,,ACE SYSTEMS,,
3,Customer,,,,Straat 10
7,Customer,,,Q8484,
7,Customer,B,,,
7,Customer,,XYZ AUTOMAT,,
7,Customer,,,,Laan 99
To import this file into a dataframe I do:
df = pd.read_csv('data.csv').fillna('')
This results in:
------------------------------------------------------------------
| |parent_id|parent_name|Type|Companyname|Custsupid|Streetaddress|
------------------------------------------------------------------
|0|3 |Customer | | |C0010 | |
|1|3 |Customer |A | | | |
|2|3 |Customer | |ACE SYSTEMS| | |
|3|3 |Customer | | | |Straat 10 |
|4|7 |Customer | | |Q8484 | |
|5|7 |Customer |B | | | |
|6|7 |Customer | |XYZ AUTOMAT| | |
|7|7 |Customer | | | |Laan 99 |
------------------------------------------------------------------
However, what I want to end up with, is a dataframe that looks like this:
------------------------------------------------------------------
| |parent_id|parent_name|Type|Companyname|Custsupid|Streetaddress|
------------------------------------------------------------------
|0|3 |Customer |A |ACE SYSTEMS|C0010 |Straat 10 |
|1|7 |Customer |B |XYZ AUTOMAT|Q8484 |Laan 99 |
------------------------------------------------------------------
I already tried with df.groupby etc. but I can't produce the desired result.
Is there a way to accomplish this with a pandas dataframe?
Upvotes: 2
Views: 399
Reputation: 880757
In [37]: df.groupby(['parent_id', 'parent_name']).sum()
Out[37]:
Type Companyname Custsupid Streetaddress
parent_id parent_name
3 Customer A ACE SYSTEMS C0010 Straat 10
7 Customer B XYZ AUTOMAT Q8484 Laan 99
sum
is adding strings together, and thus this relies on the fact that adding empty strings to a non-empty string returns the non-empty string.
Upvotes: 2