Reputation: 139
I have a dataframe which looks like this and i would like to calculate the percentage of the columnB comparing to columnA. In this example in columnB i have 3 values which are identic to the values of columnA
columnA columnB
0 A None
1 H H <---
2 A A <---
3 H H <---
4 A H
expected result:
columnB
0 75%
stay healthy!
EDIT: I just noticed that in my use case, i want to ignore rows which contain the 'None' value. I want the result to be 75 or 75%.
Upvotes: 1
Views: 117
Reputation: 35646
To get the exact output in that format use:
new_df = df.replace({'None': None}).dropna()
result = (
new_df[['columnB']].eq(new_df['columnA'], axis=0)
.mean().mul(100)
.to_frame().T.applymap('{:.0f}%'.format)
)
Assuming the None values are already the python None
or NaN, and not the string
'None'` use:
new_df = df.dropna()
result = (
new_df[['columnB']].eq(new_df['columnA'], axis=0)
.mean().mul(100)
.to_frame().T.applymap('{:.0f}%'.format)
)
result
:
columnB
0 75%
Assuming just the values would do use:
new_df = df.replace({'None': None}).dropna()
result = new_df['columnB'].eq(new_df['columnA']).mean() * 100
75.0
Complete Working Example:
import pandas as pd
df = pd.DataFrame({'columnA': ['A', 'H', 'A', 'H', 'A'],
'columnB': ['None', 'H', 'A', 'H', 'H']})
new_df = df.replace({'None': None}).dropna()
result = (
new_df[['columnB']].eq(new_df['columnA'], axis=0)
.mean().mul(100)
.to_frame().T.applymap('{:.0f}%'.format)
)
print(result)
Upvotes: 2
Reputation: 195458
To get a percentage:
perc = df["columnA"].eq(df["columnB"]).sum() / len(df) * 100
print(perc)
Prints:
60.0
As a dataframe:
df_out = pd.DataFrame(
{"ColumnB": [df["columnA"].eq(df["columnB"]).sum() / len(df) * 100]}
)
print(df_out)
Prints:
ColumnB
0 60.0
Upvotes: 2