Reputation: 274
we have the following data
Name genres
A Action|Adventure|Science Fiction|Thriller
B Action|Adventure|Science Fiction|Thriller
C Adventure|Science Fiction|Thriller
I want the data such that my data frame is
Name genres
A Action
A Adventure
A Science Fiction
A Thriller
B Action
B Adventure
B Science Fiction
B Thriller
C Adventure
C Science Fiction
C Thriller
here is my code
gen = df1[df1['genres'].str.contains('|')]
gen1 = gen.copy()
gen2 = gen.copy()
gen3 = gen.copy()
gen4 = gen.copy()
gen1['genres'] = gen1['genres'].apply(lambda x: x.split("|")[0])
gen2['genres'] = gen2['genres'].apply(lambda x: x.split("|")[1])
gen3['genres'] = gen3['genres'].apply(lambda x: x.split("|")[2])
gen4['genres'] = gen4['genres'].apply(lambda x: x.split("|")[3])
I am getting the error
IndexError: list index out of range
Upvotes: 3
Views: 3129
Reputation: 862751
Crete list of genres by split
, repeat
values by str.len
and last flatten lists by chain.from_iterable
:
from itertools import chain
genres = df['genres'].str.split('|')
df = pd.DataFrame({
'Name' : df['Name'].values.repeat(genres.str.len()),
'genres' : list(chain.from_iterable(genres.tolist()))
})
print (df)
Name genres
0 A Action
1 A Adventure
2 A Science Fiction
3 A Thriller
4 B Action
5 B Adventure
6 B Science Fiction
7 B Thriller
8 C Adventure
9 C Science Fiction
10 C Thriller
EDIT:
Solution for dynamic number of columns:
print (df)
Name genres col
0 A Action|Adventure|Science Fiction|Thriller 2
1 B Action|Adventure|Science Fiction|Thriller 3
2 C Adventure|Science Fiction|Thriller 5
from itertools import chain
cols = df.columns.difference(['genres'])
genres = df['genres'].str.split('|')
df = (df.loc[df.index.repeat(genres.str.len()), cols]
.assign(genres=list(chain.from_iterable(genres.tolist()))))
print (df)
Name col genres
0 A 2 Action
0 A 2 Adventure
0 A 2 Science Fiction
0 A 2 Thriller
1 B 3 Action
1 B 3 Adventure
1 B 3 Science Fiction
1 B 3 Thriller
2 C 5 Adventure
2 C 5 Science Fiction
2 C 5 Thriller
Upvotes: 5