Reputation: 8628
I have dataframe as follows:
df =
DATA TYPE_1 TYPE_2 TYPE_3 EVALUATED
WW A234 456 456 0
AA 456 123A 567 1
BB 456 123A 456 1
df = pd.DataFrame({"DATA":["WW","AA","BB"],"TYPE_1":["A234","456","456"],"TYPE_2":["456","123A","123A"],"TYPE_3":["456","567","456"],"EVALUATED":[0,1,1]})
I am calculating the number of times TYPE*
apeared with EVALUATED
equal to 0 and 1. This is the code that I am using:
grouped = (df.set_index('EVALUATED')
.filter(like='TYPE_')
.stack()
.to_frame('TYPE')
.reset_index()
.pivot_table(index='TYPE', columns='EVALUATED', aggfunc='size', fill_value=0)
).reset_index()
I need to slightly improve this code by counting only unique occurences of TYPE*
per row.
For example, 456
appeared two times in row 1, but it must be counted only once.
The result should be this one:
grouped =
TYPE 0 1
------------
A234 1 0
456 1 2
123A 0 2
567 0 1
Upvotes: 3
Views: 308
Reputation: 29711
Steps:
1) Subset columns starting with the char "TYPE_X"
.
2) Construct a new DF
by taking thier unique values across columns. Let it's new index be the contents present in EVALUATED column.
3) Perform Groupby
w.r.t the index axis by specifying level=0
. Using apply
, stack
the frame so that a series gets created. Take it's distinct counts using value_counts
. Unstack
the obtained DF
with providing fill_value=0
. Finally, transpose the resulting dataframe.
TYPE_cols = df.filter(like="TYPE_")
d = pd.DataFrame([pd.unique(x) for x in TYPE_cols.values], df['EVALUATED'].values)
result = d.groupby(level=0).apply(lambda x: pd.value_counts(x.values.ravel()))
result.unstack(fill_value=0).T.reset_index().rename(columns={"index":"TYPE"})
Upvotes: 1
Reputation: 862611
My solution is similar as your, only added .apply(lambda x: pd.Series(x.unique()), axis=1)
for remove duplicates in rows and another possible solution for fast size
with groupby
and unstack
:
grouped = df.set_index('EVALUATED')
.filter(like='TYPE_')
.apply(lambda x: pd.Series(x.unique()), axis=1)
.stack()
.to_frame('TYPE')
.reset_index()
.groupby(['TYPE', 'EVALUATED'])
.size()
.unstack(fill_value=0)
print (grouped)
EVALUATED 0 1
TYPE
123A 0 2
456 1 2
567 0 1
A234 1 0
Your solution:
grouped = (df.set_index('EVALUATED')
.filter(like='TYPE_')
.apply(lambda x: pd.Series(x.unique()), axis=1) #added row for unique rows
.stack()
.to_frame('TYPE')
.reset_index()
.pivot_table(index='TYPE', columns='EVALUATED', aggfunc='size', fill_value=0)
).reset_index()
print (grouped)
EVALUATED TYPE 0 1
0 123A 0 2
1 456 1 2
2 567 0 1
3 A234 1 0
Upvotes: 2