Alambek
Alambek

Reputation: 95

Calculate the percentage of Ethnicity in R

I am analyzing the Males data set from the Ecdat package in R.

I would like to calulate the percentage of each group of people (black, hisp and other) which are affiliated to the union.

The structure of the data is:


 $str(Males)
 
 'data.frame':  4360 obs. of  12 variables:

 $ nr        : int  13 13 13 13 13 13 13 13 17 17 ...
 $ year      : int  1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 ...
 $ school    : int  14 14 14 14 14 14 14 14 13 13 ...
 $ exper     : int  1 2 3 4 5 6 7 8 4 5 ...
 $ union     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
 $ ethn      : Factor w/ 3 levels "other","black",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ maried    : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ health    : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ wage      : num  1.2 1.85 1.34 1.43 1.57 ...
 $ industry  : Factor w/ 12 levels "Agricultural",..: 7 8 7 7 8 7 7 7 4 4 ...
 $ occupation: Factor w/ 9 levels "Professional, Technical_and_kindred",..: 9 9 9 9 5 2 2 2 2 2 ...
 $ residence : Factor w/ 4 levels "rural_area","north_east",..: 2 2 2 2 2 2 2 2 2 2 ...

The following code can select the year of 1980:

Males %>% 
  filter(year == '1980') %>%
  select(union, ethn)
        union  ethn
1       no    other
9       no    other
17      no    other
25     yes    other
33     yes    hisp
41      no    hisp
49      no    other
57      no    other
65     yes    black
...    ...    ...

The final result should be something like this:


Year: 1980:

union ethn    pct
no    other   0.25
no    black   0.25
no    hisp    ...
yes   other   ...
yes   black   ...
yes   hisp    ...

Year: 1981:

union ethn    pct
no    other   0.25
no    black   0.25
no    hisp    ...
yes   other   ...
yes   black   ...
yes   hisp    ...


....

Upvotes: 0

Views: 494

Answers (2)

Alambek
Alambek

Reputation: 95

Meanwhile I obtained a different way to answer this question, using the function pct_routine.

  df1980 <- Males %>% 
    filter(year == '1980') %>%
    select(union, ethn) 

   pct.1980 <- pct_routine(df1980, ethn,union)
   pct.1980

The result is the same as rodolfosveiga suggested:

  # A tibble: 6 x 3
  # Groups:   ethn [3]
    ethn  union   pct
    <fct> <fct> <dbl>
  1 other no    0.778
  2 other yes   0.222
  3 black no    0.635
  4 black yes   0.365
  5 hisp  no    0.694
  6 hisp  yes   0.306

Upvotes: 1

rodolfoksveiga
rodolfoksveiga

Reputation: 1261

You can solve it using group_by() and summarize(), as follows:

df %>%
  Males %>%
  filter(year == '1980') %>%
  select(union, ethn) %>%
  group_by(ethn) %>%
  summarize(yes = sum(union == 'yes')*100/n(),
            no = sum(union == 'no')*100/n())

Here is the output:

  # A tibble: 3 x 3
    ethn    yes    no
    <fct> <dbl> <dbl>
  1 other  22.2  77.8
  2 black  36.5  63.5
  3 hisp   30.6  69.4

Upvotes: 1

Related Questions