Reputation: 95
I am analyzing the Males data set from the Ecdat package in R.
I would like to calulate the percentage of each group of people (black, hisp and other) which are affiliated to the union.
The structure of the data is:
$str(Males)
'data.frame': 4360 obs. of 12 variables:
$ nr : int 13 13 13 13 13 13 13 13 17 17 ...
$ year : int 1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 ...
$ school : int 14 14 14 14 14 14 14 14 13 13 ...
$ exper : int 1 2 3 4 5 6 7 8 4 5 ...
$ union : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
$ ethn : Factor w/ 3 levels "other","black",..: 1 1 1 1 1 1 1 1 1 1 ...
$ maried : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ health : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ wage : num 1.2 1.85 1.34 1.43 1.57 ...
$ industry : Factor w/ 12 levels "Agricultural",..: 7 8 7 7 8 7 7 7 4 4 ...
$ occupation: Factor w/ 9 levels "Professional, Technical_and_kindred",..: 9 9 9 9 5 2 2 2 2 2 ...
$ residence : Factor w/ 4 levels "rural_area","north_east",..: 2 2 2 2 2 2 2 2 2 2 ...
The following code can select the year of 1980:
Males %>%
filter(year == '1980') %>%
select(union, ethn)
union ethn
1 no other
9 no other
17 no other
25 yes other
33 yes hisp
41 no hisp
49 no other
57 no other
65 yes black
... ... ...
The final result should be something like this:
Year: 1980:
union ethn pct
no other 0.25
no black 0.25
no hisp ...
yes other ...
yes black ...
yes hisp ...
Year: 1981:
union ethn pct
no other 0.25
no black 0.25
no hisp ...
yes other ...
yes black ...
yes hisp ...
....
Upvotes: 0
Views: 494
Reputation: 95
Meanwhile I obtained a different way to answer this question, using the function pct_routine.
df1980 <- Males %>%
filter(year == '1980') %>%
select(union, ethn)
pct.1980 <- pct_routine(df1980, ethn,union)
pct.1980
The result is the same as rodolfosveiga suggested:
# A tibble: 6 x 3
# Groups: ethn [3]
ethn union pct
<fct> <fct> <dbl>
1 other no 0.778
2 other yes 0.222
3 black no 0.635
4 black yes 0.365
5 hisp no 0.694
6 hisp yes 0.306
Upvotes: 1
Reputation: 1261
You can solve it using group_by()
and summarize()
, as follows:
df %>%
Males %>%
filter(year == '1980') %>%
select(union, ethn) %>%
group_by(ethn) %>%
summarize(yes = sum(union == 'yes')*100/n(),
no = sum(union == 'no')*100/n())
Here is the output:
# A tibble: 3 x 3
ethn yes no
<fct> <dbl> <dbl>
1 other 22.2 77.8
2 black 36.5 63.5
3 hisp 30.6 69.4
Upvotes: 1