Reputation: 469
In R, I have a data.frame like this:
df1 <- data.frame(
grade = rep(LETTERS[1:5], 4),
sex = c(rep("male", 5), rep("female", 5), rep("male", 4), rep("female", 6)),
class = c(rep(1, 10), rep(2, 10))
)
df1
grade sex class
1 A male 1
2 B male 1
3 C male 1
4 D male 1
5 E male 1
6 A female 1
7 B female 1
8 C female 1
9 D female 1
10 E female 1
11 A male 2
12 B male 2
13 C male 2
14 D male 2
15 E female 2
16 A female 2
17 B female 2
18 C female 2
19 D female 2
20 E female 2
I want to count the percentage of sex in each class and make another data.frame like:
Class Male_percent Female_percentage
1 50% 50%
2 40% 60%
Can someone teach me how to do it? This question might have been asked before, but I don't know what's the keyword for this question. I am sorry if I ask the same question again.
Upvotes: 2
Views: 3719
Reputation: 23064
Try tabyl
from the janitor package:
library(janitor)
df1 %>%
tabyl(class, sex) %>%
adorn_percentages()
class female male
1 0.5 0.5
2 0.6 0.4
If you want to format as percentages add adorn_pct_formatting()
:
df1 %>%
tabyl(class, sex) %>%
adorn_percentages() %>%
adorn_pct_formatting()
class female male
1 50.0% 50.0%
2 60.0% 40.0%
Disclaimer: I am the author of these functions.
Upvotes: 1
Reputation: 887971
You can try
prop.table(table(df1[3:2]),1)*100
# sex
#class female male
# 1 50 50
# 2 60 40
Or with data.table
library(data.table)
setDT(df1)[, .N, by = .(class, sex)
][, .(Male_percent = paste0(100 * N[sex == 'male'] / sum(N), '%'),
Female_percent = paste0(100 * N[sex == 'female'] / sum(N), '%')),
by = class]
# class Male_percent Female_percent
#1: 1 50% 50%
#2: 2 40% 60%
Or using dplyr
library(dplyr)
df1 %>%
group_by(class) %>%
summarise(Male_Percent= sprintf('%d%%', 100*sum(sex=='male')/n()),
Female_Percent = sprintf('%d%%', 100*sum(sex=='female')/n()))
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
Or
library(sqldf)
res1 <- sqldf('select class,
100*sum(sex=="male")/count(sex) as m,
100*sum(sex=="female")/count(sex) as f,
"%" as p
from df1
group by class')
sqldf("select class,
m||p as Male_Percent,
f||p as Female_Percent
from res1")
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
Based on @G.Grothendieck's comments, the sqldf
comments can be simplified to
sqldf("select class,
(100 * avg(sex = 'male')) || '%' as Male_Percent,
(100 * avg(sex = 'female')) || '%' as Female_Percent
from df1 group
by class")
# class Male_Percent Female_Percent
#1 1 50.0% 50.0%
#2 2 40.0% 60.0%
Upvotes: 4
Reputation: 5414
using data.table
package you can do the following
setDT(df)[ , .(
Male_Percent = paste0(( nrow(.SD[sex == "male"]) / .N ) * 100 , "%") ,
Female_Percent = paste0(( nrow(.SD[sex == "female"]) / .N ) * 100 , "%")
) ,
by = class
]
Result
# class Male_Percent Female_Percent
# 1: 1 50% 50%
# 2: 2 40% 60%
another dplyr
solution will be
df %>%
group_by(sex , class) %>%
summarise(n = n()) %>%
group_by(class) %>%
summarise(
Male_Percent = paste0((n[sex == "male"] / sum(n)) * 100 , "%") ,
Female_Percent = paste0((n[sex == "female"] / sum(n) * 100) , "%")
)
# class Male_Percent Female_Percent
# 1 50% 50%
# 2 40% 60%
Upvotes: 0