Reputation: 1561
I have some sales data that has customer wise data. All purchases made by a customer is all stored in a single row as shown below. I am trying to create a new column wherein all products purchased by each customer is listed in a single row separated by a comma
id, value
101, day:day_1\nproducts:abc\nday:day_2\nproducts:xyz\nday:day_3\nproducts:def
102, day:day_1\nproducts:prod_1\nday:day_4\nproducts:prod_2
103, \n
Trying to convert the above data into the below format:
id,value
101,abc,xyz,def
102,prod_1,prod_2
103,
Upvotes: 0
Views: 71
Reputation: 863741
If possible use Series.str.findall
with Series.str.join
:
df['value'] = df['value'].str.findall('products:(.+)').str.join(',')
print (df)
id value
0 101 abc,xyz,def
1 102 prod_1,prod_2
2 103
Or solution with split
in list comprehension:
L = [','.join(y.split(":")[1] for y in v.split("\n") if ('products' in y))
if pd.notna(v) and (':' in v) and ('\n' in v)
else ''
for v in df['value']]
df1 = df[['id']].join(pd.DataFrame(L, index=df.index, columns=['value']))
print (df1)
id value
0 101 abc,xyz,def
1 102 prod_1,prod_2
2 103
Upvotes: 2