duplicating rows by splitting comma separated multiple values in another column pandas

Question

I found code that should work from NameError: name 'Series' is not defined

But I get an error "name 'Series' is not defined". It worked fine in the example, but this error did come up for other users as well. Does anyone know how to make it work?

Any help would be appreciated!

original_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a,b,c', 'title': 'title2'},
               {'country': 'd,e,f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a', 'title': 'title2'},
               {'country': 'b', 'title': 'title2'},
               {'country': 'c', 'title': 'title2'},
               {'country': 'd', 'title': 'title3'},
               {'country': 'e', 'title': 'title3'},
               {'country': 'f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

#Code I used:
desired_df = pd.concat(
    [
        Series(row["title"], row["country"].split(","))
        for _, row in original_df.iterrows()
    ]
).reset_index()

ALollz · Accepted Answer

First split the column on commas to get a list and then you can explode that Series of lists. Move 'title' to the index so it gets repeated for each element in 'country'. The last two parts just clean up the names and remove title from the index.

(df.set_index('title')['country']
   .str.split(',')
   .explode()
   .rename('country')
   .reset_index())

    title country
0  title1       a
1  title2       a
2  title2       b
3  title2       c
4  title3       d
5  title3       e
6  title3       f
7  title4       e

Also, your original code is logically fine, but you need to properly create your object. I would recommend importing the module instead of individual classes/methods, so you create a Series with pd.Series not Series

import pandas as pd
                
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))              
                        for _, row in original_df.iterrows()]).reset_index()

duplicating rows by splitting comma separated multiple values in another column pandas

Answers (2)

Related Questions