Reputation: 1
I recuperated my data from TMDB and i've reached to a dataframe that contains: id (tmdb movie id), nameperson(nameof the each member of the cast), knownfor (movies they participated) and popularity (for each of the ppl).
My issue is that after the explodes, i arrived at the point where i have multiple lines with same id, but i havent managed to separate my actors etc in order to see their popularity by row.
My goal is to have a dataframe: id, nameperso , knownfor, popularity
Upvotes: 0
Views: 42
Reputation: 37747
You can apply pandas.Series
constructor with pandas.Series.explode
to explode all the columns that hold a list.
Try this :
out = df.set_index('id').apply(pd.Series.explode).reset_index()
out.columns= out.columns.str.replace(r"\d+", "", regex=True) #to get rid of the suffix number
print(out.head())
id nameperso known_for popularity
0 1891 Mark Hamill Acting 32.141
1 1891 Harrison Ford Acting 26.614
2 1891 Carrie Fisher Acting 8.532
3 1891 Mark Hamill Acting 32.141
4 1891 Harrison Ford Acting 26.614
Upvotes: 1