Reputation: 45
I found code that should work from NameError: name 'Series' is not defined
But I get an error "name 'Series' is not defined". It worked fine in the example, but this error did come up for other users as well. Does anyone know how to make it work?
Any help would be appreciated!
original_df = DataFrame([{'country': 'a', 'title': 'title1'},
{'country': 'a,b,c', 'title': 'title2'},
{'country': 'd,e,f', 'title': 'title3'},
{'country': 'e', 'title': 'title4'}])
desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
{'country': 'a', 'title': 'title2'},
{'country': 'b', 'title': 'title2'},
{'country': 'c', 'title': 'title2'},
{'country': 'd', 'title': 'title3'},
{'country': 'e', 'title': 'title3'},
{'country': 'f', 'title': 'title3'},
{'country': 'e', 'title': 'title4'}])
#Code I used:
desired_df = pd.concat(
[
Series(row["title"], row["country"].split(","))
for _, row in original_df.iterrows()
]
).reset_index()
Upvotes: 3
Views: 1014
Reputation: 20689
You can use pd.Series.str.split
with df.explode
here.
df['country'] = df['country'].str.split(',')
df.explode('country').reset_index(drop=True)
country title
0 a title1
1 a title2
2 b title2
3 c title2
4 d title3
5 e title3
6 f title3
7 e title4
For NameError
you can use import this way.
from pandas import DataFrame, Series
Note : Using the above import statement would only bring DataFrame
and Series
classes into the scope.
Upvotes: 1
Reputation: 59579
First split
the column on commas to get a list and then you can explode
that Series of lists. Move 'title'
to the index so it gets repeated for each element in 'country'
. The last two parts just clean up the names and remove title from the index.
(df.set_index('title')['country']
.str.split(',')
.explode()
.rename('country')
.reset_index())
title country
0 title1 a
1 title2 a
2 title2 b
3 title2 c
4 title3 d
5 title3 e
6 title3 f
7 title4 e
Also, your original code is logically fine, but you need to properly create your object. I would recommend importing the module instead of individual classes/methods, so you create a Series
with pd.Series
not Series
import pandas as pd
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))
for _, row in original_df.iterrows()]).reset_index()
Upvotes: 3