Celia
Celia

Reputation: 45

duplicating rows by splitting comma separated multiple values in another column pandas

I found code that should work from NameError: name 'Series' is not defined

But I get an error "name 'Series' is not defined". It worked fine in the example, but this error did come up for other users as well. Does anyone know how to make it work?

Any help would be appreciated!

original_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a,b,c', 'title': 'title2'},
               {'country': 'd,e,f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a', 'title': 'title2'},
               {'country': 'b', 'title': 'title2'},
               {'country': 'c', 'title': 'title2'},
               {'country': 'd', 'title': 'title3'},
               {'country': 'e', 'title': 'title3'},
               {'country': 'f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

#Code I used:
desired_df = pd.concat(
    [
        Series(row["title"], row["country"].split(","))
        for _, row in original_df.iterrows()
    ]
).reset_index()

Upvotes: 3

Views: 1014

Answers (2)

Ch3steR
Ch3steR

Reputation: 20689

You can use pd.Series.str.split with df.explode here.

df['country'] = df['country'].str.split(',')
df.explode('country').reset_index(drop=True)

  country   title
0       a  title1
1       a  title2
2       b  title2
3       c  title2
4       d  title3
5       e  title3
6       f  title3
7       e  title4

For NameError you can use import this way.

from pandas import DataFrame, Series

Note : Using the above import statement would only bring DataFrame and Series classes into the scope.

Upvotes: 1

ALollz
ALollz

Reputation: 59579

First split the column on commas to get a list and then you can explode that Series of lists. Move 'title' to the index so it gets repeated for each element in 'country'. The last two parts just clean up the names and remove title from the index.

(df.set_index('title')['country']
   .str.split(',')
   .explode()
   .rename('country')
   .reset_index())

    title country
0  title1       a
1  title2       a
2  title2       b
3  title2       c
4  title3       d
5  title3       e
6  title3       f
7  title4       e


Also, your original code is logically fine, but you need to properly create your object. I would recommend importing the module instead of individual classes/methods, so you create a Series with pd.Series not Series

import pandas as pd
                
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))              
                        for _, row in original_df.iterrows()]).reset_index()

Upvotes: 3

Related Questions