sariii
sariii

Reputation: 2150

How to count number of unique values in pandas while each cell includes list

I have a data frame like this:

import pandas as pd import numpy as np

Out[10]: 
       samples  subject  trial_num
0    [0 2 2        1          1
1    [3 3 0        1          2
2    [1 1 1        1          3
3    [0 1 2        2          1
4    [4 5 6        2          2
5    [0 8 8        2          3

I want to have the output like this:

       samples  subject  trial_num   frequency
0    [0 2 2        1          1      2    
1    [3 3 0        1          2      2
2    [1 1 1        1          3      1
3    [0 1 2        2          1      3
4    [4 5 6        2          2      3
5    [0 8 8        2          3      2

The frequency here is the number of unique values in each list per sample. For example, [0, 2, 2] only have one unique value.

I can do the unique values in pandas without having a list, or implement it using for loop to go through each row access each list and .... but I want a better pandas way to do it.

Thanks.

Upvotes: 1

Views: 127

Answers (2)

It_is_Chris
It_is_Chris

Reputation: 14113

import pandas as pd
import ast # import for sample data creation
from io import StringIO # import for sample data creation

# sample data
s = """samples;subject;trial_num
[0, 2, 2];1;1
[3, 3, 0];1;2
[1, 1, 1];1;3
[0, 1, 2];2;1
[4, 5, 6];2;2
[0, 8, 8];2;3"""

df = pd.read_csv(StringIO(s), sep=';')
df['samples'] = df['samples'].apply(ast.literal_eval)

# convert lists to a new frame and use nunique
# assign values to a col
df['frequency'] = pd.DataFrame(df['samples'].values.tolist()).nunique(1)


     samples  subject  trial_num  frequency
0  [0, 2, 2]        1          1          2
1  [3, 3, 0]        1          2          2
2  [1, 1, 1]        1          3          1
3  [0, 1, 2]        2          1          3
4  [4, 5, 6]        2          2          3
5  [0, 8, 8]        2          3          2

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195613

You can use collections.Counter for the task:

from collections import Counter

df['frequency'] = df['samples'].apply(lambda x: sum(v==1 for v in Counter(x).values()))

print(df)

Prints:

     samples  subject  trial_num  frequency
0  [0, 2, 2]        1          1          1
1  [3, 3, 0]        1          2          1
2  [1, 1, 1]        1          3          0
3  [0, 1, 2]        2          1          3
4  [4, 5, 6]        2          2          3
5  [0, 8, 8]        2          3          1

EDIT: For updated question:

df['frequency'] = df['samples'].apply(lambda x: len(set(x)))

print(df)

Prints:

     samples  subject  trial_num  frequency
0  [0, 2, 2]        1          1          2
1  [3, 3, 0]        1          2          2
2  [1, 1, 1]        1          3          1
3  [0, 1, 2]        2          1          3
4  [4, 5, 6]        2          2          3
5  [0, 8, 8]        2          3          2

Upvotes: 2

Related Questions