Compare columnA to columnB and get percentage of specific column

Question

I have a dataframe which looks like this and i would like to calculate the percentage of the columnB comparing to columnA. In this example in columnB i have 3 values which are identic to the values of columnA

   columnA   columnB 
0  A         None    
1  H         H           <---
2  A         A           <---
3  H         H           <---
4  A         H

expected result:

   columnB 
0  75%

stay healthy!

EDIT: I just noticed that in my use case, i want to ignore rows which contain the 'None' value. I want the result to be 75 or 75%.

Henry Ecker · Accepted Answer

To get the exact output in that format use:

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

Assuming the None values are already the python None or NaN, and not the string 'None'` use:

new_df = df.dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

result:

  columnB
0     75%

Assuming just the values would do use:

new_df = df.replace({'None': None}).dropna()
result = new_df['columnB'].eq(new_df['columnA']).mean() * 100

75.0

Complete Working Example:

import pandas as pd

df = pd.DataFrame({'columnA': ['A', 'H', 'A', 'H', 'A'],
                   'columnB': ['None', 'H', 'A', 'H', 'H']})

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

print(result)

Compare columnA to columnB and get percentage of specific column

Answers (2)

Related Questions