Reputation: 820
I have a dataframe with a column of lists:
full_list_to_check
0 NaN
1 NaN
2 [1, 2, 3, 4, 5]
3 [6, 6]
4 [11, 11]
I need to create a new column where it shows a distinct list for each row if duplicates exist in the list, otherwise just the same list.
full_list_to_check new_col
0 NaN NaN
1 NaN NaN
2 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
3 [6, 6] [6]
4 [11, 11] [11]
I have tried this:
df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)))
But I get this error:
TypeError: 'float' object is not iterable
Upvotes: 2
Views: 591
Reputation: 4215
You could use:
df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)
The other answers only works if there are no other values then lists or NaN in your data.
Upvotes: 2
Reputation: 1545
You must check Nan
:
df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)
Update:
df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)
0 NaN
1 NaN
2 [1, 2, 3, 4, 5]
3 [6]
4 [11]
Upvotes: 4
Reputation: 799
You can try this:
df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))
full_list_to_check new_col
0 NaN NaN
1 NaN NaN
2 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
3 [6, 6] [6]
4 [11, 11] [11]
Upvotes: 2