Dinosaurius
Dinosaurius

Reputation: 8628

How to count unique occurences of particular value per row?

I have dataframe as follows:

df =

DATA   TYPE_1  TYPE_2  TYPE_3  EVALUATED
WW     A234    456     456     0
AA     456     123A    567     1
BB     456     123A    456     1

df = pd.DataFrame({"DATA":["WW","AA","BB"],"TYPE_1":["A234","456","456"],"TYPE_2":["456","123A","123A"],"TYPE_3":["456","567","456"],"EVALUATED":[0,1,1]})

I am calculating the number of times TYPE* apeared with EVALUATED equal to 0 and 1. This is the code that I am using:

grouped = (df.set_index('EVALUATED')
        .filter(like='TYPE_')
        .stack()
        .to_frame('TYPE')
        .reset_index()
        .pivot_table(index='TYPE', columns='EVALUATED', aggfunc='size', fill_value=0)
    ).reset_index()

I need to slightly improve this code by counting only unique occurences of TYPE* per row.

For example, 456 appeared two times in row 1, but it must be counted only once.

The result should be this one:

grouped = 

TYPE   0   1
------------
A234   1   0
456    1   2
123A   0   2
567    0   1

Upvotes: 3

Views: 308

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Steps:

1) Subset columns starting with the char "TYPE_X".

2) Construct a new DF by taking thier unique values across columns. Let it's new index be the contents present in EVALUATED column.

3) Perform Groupby w.r.t the index axis by specifying level=0. Using apply, stack the frame so that a series gets created. Take it's distinct counts using value_counts. Unstack the obtained DF with providing fill_value=0. Finally, transpose the resulting dataframe.


TYPE_cols = df.filter(like="TYPE_")
d = pd.DataFrame([pd.unique(x) for x in TYPE_cols.values], df['EVALUATED'].values)

result = d.groupby(level=0).apply(lambda x: pd.value_counts(x.values.ravel()))
result.unstack(fill_value=0).T.reset_index().rename(columns={"index":"TYPE"})

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 862611

My solution is similar as your, only added .apply(lambda x: pd.Series(x.unique()), axis=1) for remove duplicates in rows and another possible solution for fast size with groupby and unstack:

grouped = df.set_index('EVALUATED')
            .filter(like='TYPE_')
            .apply(lambda x: pd.Series(x.unique()), axis=1)
            .stack()
            .to_frame('TYPE')
            .reset_index()
            .groupby(['TYPE', 'EVALUATED'])
            .size()
            .unstack(fill_value=0)
print (grouped)
EVALUATED  0  1
TYPE           
123A       0  2
456        1  2
567        0  1
A234       1  0

Your solution:

grouped = (df.set_index('EVALUATED')
        .filter(like='TYPE_')
        .apply(lambda x: pd.Series(x.unique()), axis=1) #added row for unique rows
        .stack()
        .to_frame('TYPE')
        .reset_index()
        .pivot_table(index='TYPE', columns='EVALUATED', aggfunc='size', fill_value=0)
    ).reset_index()

print (grouped)
EVALUATED  TYPE  0  1
0          123A  0  2
1           456  1  2
2           567  0  1
3          A234  1  0

Upvotes: 2

Related Questions