Reputation: 199
I have a dataframe train
and I have filtered a certain number of rows from the train
dataframe to form the promoted
dataframe:
print(train.department.value_counts(),'\n')
promoted=train[train.is_promoted==1]
print(promoted.department.value_counts())
The output of the above code is:
Sales & Marketing 16840
Operations 11348
Technology 7138
Procurement 7138
Analytics 5352
Finance 2536
HR 2418
Legal 1039
R&D 999
Name: department, dtype: int64
Sales & Marketing 1213
Operations 1023
Technology 768
Procurement 688
Analytics 512
Finance 206
HR 136
R&D 69
Legal 53
Name: department, dtype: int64
I want to display how much percentage of each category of the column department has appeared from the train
in the promoted
dataframe,i.e Instead of the numbers 1213,1023,768,688,etc. I should get a percentage such as: 1213/16840*100=7.2,etc. Please note that I don't want a normalized value.
Upvotes: 1
Views: 13716
Reputation: 617
Found a better answer at : https://stackoverflow.com/a/50558594/4106458
It suggests to use normalize=True named parameter for value_counts() method
For your scenario, the code would be :
promoted.department.value_counts(normalize=True) * 100
Upvotes: 2
Reputation: 413
import pandas as pd
df = pd.read_csv("/home/spaceman/my_work/Most-Recent-Cohorts-Scorecard-Elements.csv")
df=df[['STABBR']] #each values is appearing in dataframe with multiple
#after that i got
CA 717
TX 454
NY 454
FL 417
PA 382
OH 320
IL 280
MI 189
NC 189
.........
.........
print df['STABBR'].value_counts(normalize=True) #returns the relative
frequency by dividing all values by the sum of values
CA 0.099930
TX 0.063275
NY 0.063275
FL 0.058118
PA 0.053240
OH 0.044599
IL 0.039024
MI 0.026341
NC 0.026341
..............
..............
Upvotes: 0
Reputation: 1284
How about this? Example has a toy dataset, but the key idea is simply dividing one value count by the other.
import pandas as pd
import numpy as np
data = pd.DataFrame({
'department': list(range(10)) * 100,
'is_promoted': np.random.randint(0, 2, size = 1000)
})
# Slice out promoted data.
data_promoted = data[data['is_promoted'] == 1]
# Calculate share of each department that is present in data_promoted.
data_promoted['department'].value_counts().sort_index() / data['department'].value_counts().sort_index()
Gives:
0 0.50
1 0.52
2 0.45
3 0.54
4 0.41
5 0.50
6 0.45
7 0.52
8 0.60
9 0.52
Name: department, dtype: float64
Upvotes: 1
Reputation: 4233
Try:
promoted.department.value_counts()/train.department.value_counts()*100
It should give you the desired output:
Sales & Marketing 7.2030
Operations 9.0148
Technology 10.7593
..... ...
Name: department, dtype: int64
Upvotes: 2