Quan Mai
Quan Mai

Reputation: 11

R function to combine four variables?

I have 4 variables (races, asian_news,black_news,nhpi_news, and latino_news).

'races' is a factor with 6 levels: White, Asians, NHPI, Black, Latino, Multiracial.

'asian_news','black_news','nhpi_news', and 'latino_news' are a series of survey questions that have 4 outcomes: [1] ethnic, [2] mainstream, [3] both, and [4] DK.

These questions ask respondents if they primarily get their news through ethnic sources or through U.S mainstream media. These survey questions operate as follows:

The replication data can be downloaded here:


As of now, the cross-tab between races and asian_news look like this:

 > with(pre,table(races,asian_news,useNA="always"))
races                           ethnic mainstream both  DK <NA>
  3. WHITES                          0          0    0   0  500
  1. ASIAN AMERICANS               770        863  294  41  142
  2. PACIFIC ISLANDERS               0          0    0   0  410
  4.BLACKS OR AFRICAN AMERICANS      0          0    0   0  520
  6. latinos                         0          0    0   0  514
  9. MULTIRACIAL AMERICANS           0          0    0   0    0
  <NA>                               0          0    0   0    0

Similarly, the cross-tab between races and black_news look like this:

> with(pre,table(races,black_news,useNA="always"))
races                           ethnic mainstream both   DK <NA>
  3. WHITES                          0          0    0    0  500
  1. ASIAN AMERICANS                 0          0    0    0 2110
  2. PACIFIC ISLANDERS               0          0    0    0  410
  4.BLACKS OR AFRICAN AMERICANS     53        366   67   12   22
  6. latinos                         0          0    0    0  514
  9. MULTIRACIAL AMERICANS           0          0    0    0    0
  <NA>                               0          0    0    0    0

One could generate similar crosstabs with the following codes:


I want to combine these four survey questions to one unified variable. Ideally, the crosstabs between races and the desired variable would look like this

> with(pre,table(races,desired_variable,useNA="always"))
races                           ethnic mainstream both   DK <NA>
  3. WHITES                          0        500    0    0    0
  1. ASIAN AMERICANS               770        863  294   41  142
  2. PACIFIC ISLANDERS              22        332   24   13   19
  4.BLACKS OR AFRICAN AMERICANS     53        366   67   12   22
  6. latinos                       142        302   47    1   22 
  9. MULTIRACIAL AMERICANS           0          0    0    0    0
  <NA>                               0          0    0    0    0

How do I generate the "desired_variable" variable? Thanks so much in advance.

Upvotes: 1

Views: 130

Answers (3)

Zhiqiang Wang
Zhiqiang Wang

Reputation: 6769


This is my effort but the code may not be a little lengthy. My logic: 1) replace NA to white space, 2) paste four variables into on variable n_cat. Please note since you have edited the question, the output values look different from original post and those of @akrun.

pre[, 2:5] <- sapply(pre[, 2:5], function(x) stringr::str_replace_na(x, replacement = "")) 
pre$n_cat = paste0(pre$asian_news, pre$nhpi_news, pre$latino_news, pre$black_news)
table(pre$races, pre$n_cat)
#                                      both   DK ethnic mainstream
#  1. ASIAN AMERICANS              184  324   53    825       1401
#  2. PACIFIC ISLANDERS             19   24   13     22        332
#  3. WHITES                       501    0    0      0          0
#  4. BLACKS OR AFRICAN AMERICANS    8   36    5     24        163
#  5. BLACKS OR AFRICAN AMERICANS   14   31    7     29        203
#  6. latinos                       22   47    1    142        302
#  9. MULTIRACIAL AMERICANS         55    0    0      0          0

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389215

Using dplyr and tidyr, we can get the data in long format, count number of observations for races and value from different column and cast the data in wide format again.


pre %>%
  pivot_longer(cols = -races) %>%
  count(races, value) %>%
  pivot_wider(names_from = value, values_from = n)

#  races                           both    DK ethnic mainstream  `NA`
#  <fct>                          <int> <int>  <int>      <int> <int>
#1 1. ASIAN AMERICANS               324    53    825       1401  8545
#2 2. PACIFIC ISLANDERS              24    13     22        332  1249
#3 3. WHITES                         NA    NA     NA         NA  2004
#4 4. BLACKS OR AFRICAN AMERICANS    36     5     24        163   716
#5 5. BLACKS OR AFRICAN AMERICANS    31     7     29        203   866
#6 6. latinos                        47     1    142        302  1564
#7 9. MULTIRACIAL AMERICANS          NA    NA     NA         NA   220

Upvotes: 0


Reputation: 887711

We can replicate the 'races' column while unlist the columns of interest and then do the table

table(rep(pre$races, 4), unlist(pre[3:6]), useNA = "always")
#                           both   DK ethnic mainstream 1. Pacific Islander or Asian American more <NA>
#  1. ASIAN AMERICANS             294   41    770        863                                          0 6472
#  2. PACIFIC ISLANDERS            24   13      0        332                                         22 1249
#  3. WHITES                        0    0      0          0                                          0 2000
#  4.BLACKS OR AFRICAN AMERICANS   67   12     53        366                                          0 1582
#  6. latinos                      47    1    142        302                                          0 1564
#  <NA>                             0    0      0          0                                          0    0

Upvotes: 1

Related Questions