Find column value wise total against another column using pandas

Question

I have a dataframe like as shown below

import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)

cf = pd.DataFrame({'grade': rng.choice(list('ACD'),size=(8)),
                       'dash': rng.choice(list('PQRS'),size=(8)),
                       'dumeel': rng.choice(list('QWER'),size=(8)),
                       'dumma': rng.choice((1234),size=(8)),
                       'target': rng.choice([0,1],size=(8))
})

I would like to do the below

a) Find the total and %total for each of my value in the categorical columns against the target column

I tried the below but it only gets me to half way of the results.

cols = cf.select_dtypes('object')
cf.melt('target',cols).groupby(['variable','value']).size().reset_index(name='cnt of records')

How can I use the above result to compute target met and target not met details using the target column?

I expect my output to be like as shown below (note that I have shown only two columns grade and dash for sample). Code should follow the same logic for all string columns

Corralien · Accepted Answer

Select your columns to flatten with melt then join the target column. Finally, group by variable and value columns and apply a dict of functions to each group.

funcs = {
  'cnt of records': 'count',
  'target met': lambda x: sum(x),
  'target not met': lambda x: len(x) - sum(x),
  'target met %': lambda x: f"{round(100 * sum(x) / len(x), 2):.2f}%",
  'target not met %': lambda x: f"{round(100 * (len(x) - sum(x)) / len(x), 2):.2f}%"
}

out = df.select_dtypes('object').melt(ignore_index=False).join(df['target']) \
        .groupby(['variable', 'value'])['target'].agg(**funcs).reset_index()

Output:

>>> out
  variable value  cnt of records  target met  target not met target met % target not met %
0     dash     Q               2           0               2        0.00%          100.00%
1     dash     R               2           2               0      100.00%            0.00%
2     dash     S               4           2               2       50.00%           50.00%
3   dumeel     E               3           2               1       66.67%           33.33%
4   dumeel     Q               3           2               1       66.67%           33.33%
5   dumeel     R               1           0               1        0.00%          100.00%
6   dumeel     W               1           0               1        0.00%          100.00%
7    grade     A               2           0               2        0.00%          100.00%
8    grade     C               3           2               1       66.67%           33.33%
9    grade     D               3           2               1       66.67%           33.33%

Find column value wise total against another column using pandas

Answers (2)

Related Questions