Lilly
Lilly

Reputation: 988

Split a pandas column by comma and for each item split by space and finally have a set of list

I have one column in a python pandas dataframe. Each row has a python list as below. I want to split the list by comma and then for each item, split by space, take the first item then finally have a set to remove the duplicates in that list.

  1. Initial dataset (pandas df)
MATERIAL
A 2L XXX, B 4L XXX, C 6L XXX, A 2L XXX
B 2L XXX, C 4L XXX, C 6L XXX, B 2L XXX
A 2L XXX, H 4L XXX, L 6L XXX, L 6L XXX, A 2L XXX,
M 2L XXX, N 4L XXX, P 6L XXX, L 6L XXX

Output required

MATERIAL
A, B, C
B, C
A, H, L
M, N, P, L

If I have a single item in a column, str.split().str[0] works and I could get the first item (after splitting) and it works

But when I tried for list of items in a column with the following lambda function, I get the error and not able to achieve the output as above

productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ','.join([productList['MATERIAL'].str.split().str[0] for n in g]))

Would be great if someone could throw some light on this. Thanks.

Upvotes: 0

Views: 1087

Answers (2)

henrywongkk
henrywongkk

Reputation: 1918

If the order of output on each row is not important, use a set to keep unique values.

productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ', '.join(set((n.split()[0] for n in g.split(', ')))))

     MATERIAL
0     B, C, A
1        C, B
2     H, L, A
3  L, N, P, M

If the order of output is important, use a OrderedDict to preserve the order and convert it back to a list.

import collections
productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ', '.join(list(collections.OrderedDict.fromkeys((n.split()[0] for n in g.split(', '))))))

     MATERIAL
0     A, B, C
1        B, C
2     A, H, L
3  M, N, P, L

Upvotes: 0

Aryerez
Aryerez

Reputation: 3495

With one line lambda to create a list:

df['MATERIAL'] = df['MATERIAL'].map(lambda x: sorted(list(set(val.strip().split(' ')[0] for val in x.split(',')))))

Upvotes: 1

Related Questions