Reputation: 988
I have one column in a python pandas dataframe. Each row has a python list as below. I want to split the list by comma and then for each item, split by space, take the first item then finally have a set to remove the duplicates in that list.
MATERIAL
A 2L XXX, B 4L XXX, C 6L XXX, A 2L XXX
B 2L XXX, C 4L XXX, C 6L XXX, B 2L XXX
A 2L XXX, H 4L XXX, L 6L XXX, L 6L XXX, A 2L XXX,
M 2L XXX, N 4L XXX, P 6L XXX, L 6L XXX
Output required
MATERIAL
A, B, C
B, C
A, H, L
M, N, P, L
If I have a single item in a column, str.split().str[0] works and I could get the first item (after splitting) and it works
But when I tried for list of items in a column with the following lambda function, I get the error and not able to achieve the output as above
productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ','.join([productList['MATERIAL'].str.split().str[0] for n in g]))
Would be great if someone could throw some light on this. Thanks.
Upvotes: 0
Views: 1087
Reputation: 1918
If the order of output on each row is not important, use a set to keep unique values.
productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ', '.join(set((n.split()[0] for n in g.split(', ')))))
MATERIAL
0 B, C, A
1 C, B
2 H, L, A
3 L, N, P, M
If the order of output is important, use a OrderedDict to preserve the order and convert it back to a list.
import collections
productList['MATERIAL'] = productList['MATERIAL'].apply(lambda g: ', '.join(list(collections.OrderedDict.fromkeys((n.split()[0] for n in g.split(', '))))))
MATERIAL
0 A, B, C
1 B, C
2 A, H, L
3 M, N, P, L
Upvotes: 0
Reputation: 3495
With one line lambda
to create a list:
df['MATERIAL'] = df['MATERIAL'].map(lambda x: sorted(list(set(val.strip().split(' ')[0] for val in x.split(',')))))
Upvotes: 1