3pitt
3pitt

Reputation: 951

Create Multiple New Columns Based on Pipe-Delimited Column in Pandas

I have a pandas dataframe with a pipe delimited column with an arbitrary number of elements, called Parts. The number of elements in these pipe-strings varies from 0 to over 10. The number of unique elements contained in all pipe-strings is not much smaller than the number of rows (which makes it impossible for me to manually specify all of them while creating new columns).

For each row, I want to create a new column that acts as an indicator variable for each element of the pipe delimited list. For instance, if the row

...'Parts'...

...'12|34|56'

should be transformed to

...'Part_12' 'Part_34' 'Part_56'...

...1 1 1...

Because they are a lot of unique parts, these columns are obviously going to be sparse - mostly zeros since each row only contains a small fraction of unique parts.

I haven't found any approach that doesn't require manually specifying the columns (for instance, Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries). I've also looked at pandas' melt, but I don't think that's the appropriate tool.

The way I know how to solve it would be to pipe the raw CSV to another python script and deal with it on a char-by-char basis, but I need to work within my existing script since I will be processing hundreds of CSVs in this manner.

Here's a better illustration of the data

ID YEAR AMT PARTZ

1202 2007 99.34

9321 1988 1012.99 2031|8942

2342 2012 381.22 1939|8321|Amx3

Upvotes: 1

Views: 2571

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

You can use get_dummies and add_prefix:

df.Parts.str.get_dummies().add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0        1        1        1

Edit for comment and counting duplicates.

df = pd.DataFrame({'Parts':['12|34|56|12']}, index=[0])
pd.get_dummies(df.Parts.str.split('|',expand=True).stack()).sum(level=0).add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0        2        1        1

Upvotes: 2

Related Questions