GregRousell
GregRousell

Reputation: 1087

Recode Multiple Columns to Single Variable

I have some qualitative data that I have coded into various categories and I want to provide summaries for subgroups. The RQDA package is great for coding interviews but I've struggled with creating summaries for open ended survey responses. I've managed to export the coded file into HTML, and copy/paste into Excel. I now have 500 lines with all the categories in distinct columns however the same code may appear in different columns. For example, some data:

a <- c("ResponseA", "ResponseB", "ResponseC", "ResponseD", "NA")
b <- c("ResponseD", "ResponseC", "NA", "NA","NA")
c <- c("ResponseB", "ResponseA", "ResponseE", "NA", "NA")
d <- c("ResponseC", "ResponseB", "ResponseA", "NA", "NA")
df <- data.frame (a,b,c,d)

I'd like to be able to run something like

df$ResponseA <- recode (df$a | df$b | df$c, "
                        'ResponseA' = '1'; 
                         else='0' ")
df$ResponseB <- recode (df$a | df$b | df$c, "
                        'ResponseB' = '1'; 
                         else='0' ")

In short, I'd like scan 9 columns and recode into a single binary variable.

Upvotes: 0

Views: 404

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

If I understand the question correctly, perhaps you can try something like this:

## Convert your data into a long format first
dfL <- cbind(id = sequence(nrow(df)), stack(lapply(df, as.character)))

## The next three lines are mostly cleanup
dfL$id <- factor(dfL$id, sequence(nrow(df)))
dfL$values[dfL$values == "NA"] <- NA
dfL <- dfL[complete.cases(dfL), ]

## `table` is the real workhorse here
cbind(df, (table(dfL[1:2]) > 0) * 1)
#           a         b         c         d ResponseA ResponseB ResponseC ResponseD ResponseE
# 1 ResponseA ResponseD ResponseB ResponseC         1         1         1         1         0
# 2 ResponseB ResponseC ResponseA ResponseB         1         1         1         0         0
# 3 ResponseC        NA ResponseE ResponseA         1         0         1         0         1
# 4 ResponseD        NA        NA        NA         0         0         0         1         0
# 5        NA        NA        NA        NA         0         0         0         0         0

You can also try the following:

(table(rep(1:nrow(df), ncol(df)), unlist(df)) > 0) * 1L
#    
#     NA ResponseA ResponseB ResponseC ResponseD ResponseE
#   1  0         1         1         1         1         0
#   2  0         1         1         1         0         0
#   3  1         1         0         1         0         1
#   4  1         0         0         0         1         0
#   5  1         0         0         0         0         0

Upvotes: 1

Related Questions