Reputation: 67
How do i convert a column of pandas df consisting of a list of lists to a string. A snippet of the column 'categories' in a df
[['Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Cables & Interconnects', 'USB Cables'], ['Video Games', 'Sony PSP']]
[['Video Games', 'PlayStation 3', 'Accessories', 'Controllers', 'Gamepads']]
[['Cell Phones & Accessories', 'Accessories', 'Chargers', 'Travel Chargers'], ['Video Games', 'Nintendo DS']]
I tried the following code:
df.loc[:,"categories"]=[item for sublist in df.loc[:,"categories"] for item in sublist]
but its giving me an error. Is there any other way of doing this?
ValueError: Length of values does not match length of index
Expected column:
'Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Cables & Interconnects', 'USB Cables','Video Games', 'Sony PSP'
'Video Games', 'PlayStation 3', 'Accessories', 'Controllers', 'Gamepads'
'Cell Phones & Accessories', 'Accessories', 'Chargers', 'Travel Chargers','Video Games', 'Nintendo DS'
Upvotes: 1
Views: 606
Reputation: 863531
Use nested generator with join
:
df["categories"]=[', '.join(item for sublist in x for item in sublist) for x in df["categories"]]
If performance is important in larger DataFrame
:
from itertools import chain
df["categories"] = [', '.join(chain.from_iterable(x)) for x in df["categories"]]
print (df)
categories
0 Electronics, Computers & Accessories, Cables &...
1 Video Games, PlayStation 3, Accessories, Contr...
2 Cell Phones & Accessories, Accessories, Charge...
Timings: (in real data should be different, best test it first):
df = pd.concat([df] * 10000, ignore_index=True)
In [45]: %timeit df["c1"]=[', '.join(item for sublist in x for item in sublist) for x in df["categories"]]
39 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [46]: %timeit df["c2"]=[', '.join(chain.from_iterable(x)) for x in df["categories"]]
22.1 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [47]: %timeit df['c3'] = df["categories"].apply(lambda x: ', '.join(str(r) for v in x for r in v))
66.7 ms ± 695 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation: 8641
You need list comprehension
df['col'] = df.col.apply(lambda x: ', '.join(str(r) for v in x for r in v))
Output:
col
0 Electronics, Computers & Accessories, Cables &...
1 Video Games, PlayStation 3, Accessories, Contr...
2 Cell Phones & Accessories, Accessories, Charge...
Upvotes: 0