Split a pandas column by comma and for each item split by space and finally have a set of list

Question

I have one column in a python pandas dataframe. Each row has a python list as below. I want to split the list by comma and then for each item, split by space, take the first item then finally have a set to remove the duplicates in that list.

Initial dataset (pandas df)

MATERIAL
A 2L XXX, B 4L XXX, C 6L XXX, A 2L XXX
B 2L XXX, C 4L XXX, C 6L XXX, B 2L XXX
A 2L XXX, H 4L XXX, L 6L XXX, L 6L XXX, A 2L XXX,
M 2L XXX, N 4L XXX, P 6L XXX, L 6L XXX

Output required

MATERIAL
A, B, C
B, C
A, H, L
M, N, P, L

If I have a single item in a column, str.split().str[0] works and I could get the first item (after splitting) and it works

But when I tried for list of items in a column with the following lambda function, I get the error and not able to achieve the output as above

productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ','.join([productList['MATERIAL'].str.split().str[0] for n in g]))

Would be great if someone could throw some light on this. Thanks.

Aryerez · Accepted Answer

With one line lambda to create a list:

df['MATERIAL'] = df['MATERIAL'].map(lambda x: sorted(list(set(val.strip().split(' ')[0] for val in x.split(',')))))

Split a pandas column by comma and for each item split by space and finally have a set of list

Answers (2)

Related Questions