slb20
slb20

Reputation: 139

Compare columnA to columnB and get percentage of specific column

I have a dataframe which looks like this and i would like to calculate the percentage of the columnB comparing to columnA. In this example in columnB i have 3 values which are identic to the values of columnA

   columnA   columnB 
0  A         None    
1  H         H           <---
2  A         A           <---
3  H         H           <---
4  A         H 

expected result:

   columnB 
0  75%          

stay healthy!

EDIT: I just noticed that in my use case, i want to ignore rows which contain the 'None' value. I want the result to be 75 or 75%.

Upvotes: 1

Views: 117

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35646

To get the exact output in that format use:

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

Assuming the None values are already the python None or NaN, and not the string 'None'` use:

new_df = df.dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

result:

  columnB
0     75%

Assuming just the values would do use:

new_df = df.replace({'None': None}).dropna()
result = new_df['columnB'].eq(new_df['columnA']).mean() * 100
75.0

Complete Working Example:

import pandas as pd

df = pd.DataFrame({'columnA': ['A', 'H', 'A', 'H', 'A'],
                   'columnB': ['None', 'H', 'A', 'H', 'H']})

new_df = df.replace({'None': None}).dropna()
result = (
    new_df[['columnB']].eq(new_df['columnA'], axis=0)
        .mean().mul(100)
        .to_frame().T.applymap('{:.0f}%'.format)
)

print(result)

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195458

To get a percentage:

perc = df["columnA"].eq(df["columnB"]).sum() / len(df) * 100
print(perc)

Prints:

60.0

As a dataframe:

df_out = pd.DataFrame(
    {"ColumnB": [df["columnA"].eq(df["columnB"]).sum() / len(df) * 100]}
)
print(df_out)

Prints:

   ColumnB
0     60.0

Upvotes: 2

Related Questions