Reputation: 2150
I have a data frame like this:
import pandas as pd import numpy as np
Out[10]:
samples subject trial_num
0 [0 2 2 1 1
1 [3 3 0 1 2
2 [1 1 1 1 3
3 [0 1 2 2 1
4 [4 5 6 2 2
5 [0 8 8 2 3
I want to have the output like this:
samples subject trial_num frequency
0 [0 2 2 1 1 2
1 [3 3 0 1 2 2
2 [1 1 1 1 3 1
3 [0 1 2 2 1 3
4 [4 5 6 2 2 3
5 [0 8 8 2 3 2
The frequency here is the number of unique values in each list per sample. For example, [0, 2, 2]
only have one unique value.
I can do the unique values in pandas without having a list, or implement it using for loop to go through each row access each list and .... but I want a better pandas way to do it.
Thanks.
Upvotes: 1
Views: 127
Reputation: 14113
import pandas as pd
import ast # import for sample data creation
from io import StringIO # import for sample data creation
# sample data
s = """samples;subject;trial_num
[0, 2, 2];1;1
[3, 3, 0];1;2
[1, 1, 1];1;3
[0, 1, 2];2;1
[4, 5, 6];2;2
[0, 8, 8];2;3"""
df = pd.read_csv(StringIO(s), sep=';')
df['samples'] = df['samples'].apply(ast.literal_eval)
# convert lists to a new frame and use nunique
# assign values to a col
df['frequency'] = pd.DataFrame(df['samples'].values.tolist()).nunique(1)
samples subject trial_num frequency
0 [0, 2, 2] 1 1 2
1 [3, 3, 0] 1 2 2
2 [1, 1, 1] 1 3 1
3 [0, 1, 2] 2 1 3
4 [4, 5, 6] 2 2 3
5 [0, 8, 8] 2 3 2
Upvotes: 1
Reputation: 195613
You can use collections.Counter
for the task:
from collections import Counter
df['frequency'] = df['samples'].apply(lambda x: sum(v==1 for v in Counter(x).values()))
print(df)
Prints:
samples subject trial_num frequency
0 [0, 2, 2] 1 1 1
1 [3, 3, 0] 1 2 1
2 [1, 1, 1] 1 3 0
3 [0, 1, 2] 2 1 3
4 [4, 5, 6] 2 2 3
5 [0, 8, 8] 2 3 1
EDIT: For updated question:
df['frequency'] = df['samples'].apply(lambda x: len(set(x)))
print(df)
Prints:
samples subject trial_num frequency
0 [0, 2, 2] 1 1 2
1 [3, 3, 0] 1 2 2
2 [1, 1, 1] 1 3 1
3 [0, 1, 2] 2 1 3
4 [4, 5, 6] 2 2 3
5 [0, 8, 8] 2 3 2
Upvotes: 2