Reputation: 95
I have a dataframe with a column containing lists, I am trying to iterate over each row in the dataframe and concatenate with each element of the list for that row. I am trying to write code to achieve the result displayed in 'molecule_species'. Any thoughts on this would be appreciated.
Dataframe =
import pandas as pd
df = pd.DataFrame({'molecule': ['a',
'b',
'c',
'd',
'e'],
'species' : [['dog'],
['horse','pig'],
['cat', 'dog'],
['cat','horse','pig'],
['chicken','pig']]})
New column I am trying to create by iterating over rows and list elements, concatenating 'molecule' with each element in the list contained in 'species'.
df['molecule_species'] = [['a dog'],
['b horse','b pig'],
['c cat', 'c dog'],
['d cat','d horse','d pig'],
['e chicken','e pig']]
Upvotes: 3
Views: 5301
Reputation: 30920
Pandas >= 0.25.0
Use Series.explode
and then join
,
return to the list with GroupBy.agg
:
df['molecule_species'] = (df.explode('species')
.apply(' '.join,axis=1)
.groupby(level=0)
.agg(list) )
print(df)
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
Pandas < 0.25.0
df['molecule_species']=(df.reindex(df.index.repeat(df.species.str.len()))
.assign(species=np.concatenate(df.species.values))
.apply(' '.join,axis=1)
.groupby(level=0)
.agg(list) )
print(df)
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
Another approach is Series.str.cat
df2 = df.explode('species')
df['molecule_species']=df2['molecule'].str.cat(df2['species'],sep=' ').groupby(level=0).agg(list)
Upvotes: 6
Reputation: 25239
You may try double list comprehension. In processing sub-lists and string concatenation within cells of pandas, list comprehension is much faster than using built-in pandas methods.
df['molecule_species'] = [[mol+' '+ a_spec for a_spec in specs]
for mol, specs in zip(df.molecule, df.species)]
Out[87]:
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
Upvotes: 4
Reputation: 2643
You can try this,
>>> import pandas as pd
>>> df = pd.DataFrame({'molecule': ['a',
'b',
'c',
'd',
'e'],
'species' : [['dog'],
['horse','pig'],
['cat', 'dog'],
['cat','horse','pig'],
['chicken','pig']]})
>>> df['molecule_species'] = (df
.apply(lambda x: [x['molecule'] + ' ' + m for m in x['species']], axis=1))
>>> df
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
Upvotes: 5