Reputation: 3
I am trying to find a method in Pandas to automate the cleaning of some source data that is delimited by a comma. There are a range of columns that are delimited by a comma, however there are also columns that are not delimited by anything. I need a method to split the delimited cells that are delimited, and append them to the row below, while also copying the values from the non-delimited cells downwards. Unfortunately the way the data is entered cannot be changed.
Source Data:
Name | Purchase Year | SKU | Quantity | Description | Territory | Order Reference | Contact Name | Contact Email | Contact Phone | Notes |
---|---|---|---|---|---|---|---|---|---|---|
John Smith,John Smith | 7,7 | ABCEF,HIJKL | 1,1 | Territory1,Territory2 | O-1394 | John Smith | [email protected] | 8008383838 | Note123 |
Expected Output:
Name | Purchase Year | SKU | Quantity | Description | Territory | Order Reference | Contact Name | Contact Email | Contact Phone | Notes |
---|---|---|---|---|---|---|---|---|---|---|
John Smith | 7 | ABCEF | 1 | Territory1 | O-1394 | John Smith | [email protected] | 8008383838 | Note123 | |
John Smith | 7 | HIJKL | 1 | Territory2 | O-1394 | John Smith | [email protected] | 8008383838 | Note123 |
I've managed to get this working in Power Query for Excel, however was wondering if it was doable in Pandas to save me the time figuring out how to automate the Power Query component.
I've seen the Explode function in Pandas that can be called, but not sure if this will also work for copying cell values as needed.
Any tips or ideas?
Upvotes: 0
Views: 439
Reputation: 6564
You need to use the Explode function:
df.explode(['Name','Purchase Year','SKU'])
Use the column names that should be splitted
Upvotes: 1