Seehyun Park
Seehyun Park

Reputation: 11

How to match row to column when 'Yes' is entered in multiple columns

I have data like this below. I want to convert data format as Want.

> data
   ID  X1  X2  X3  X4  X5 X6  X7
1   A Yes  No Yes  No  Yes No No
2   B Yes  No  No  No  No No  No
3   C  No  No  No Yes Yes No  No
4   D  No  No  No  No  No No Yes
5   E  No Yes  No  No  No No  No
6   F  No  No  No  No  No No  No
7   G  No  No  No  No  No No  No
8   H  No  No Yes  No  No No  No
9   I  No  No  No  No  No No  No
10  J Yes  No  No Yes  No No  No

Want :

ID  CODE
A   X1
A   X3
A   X5
B   X1
C   X4
C   X5
D   X7
E   X2
F   NA
G   NA
H   X3
I   NA
J   X1
J   X4

I've tried many ways but couldn't solve it. Thanks for help

Upvotes: 0

Views: 89

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101247

Here is a base R solution using apply()

code <- apply(data, 1, function(x) {ifelse(sum(x =="Yes") == 0,r <- NA,r <- names(df)[which(x=="Yes")]);r})
id <- rep(data$ID,lengths(code))
dfout <- data.frame(ID = id, CODE = unlist(code))

such that

> dfout
   ID CODE
1   A   X1
2   A   X3
3   A   X5
4   B   X1
5   C   X4
6   C   X5
7   D   X7
8   E   X2
9   F <NA>
10  G <NA>
11  H   X3
12  I <NA>
13  J   X1
14  J   X4

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

One way in tidyverse is to get the data in long format, select rows with value == 'Yes' and add missing levels with complete.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -ID) %>%
  filter(value == "Yes") %>%
  complete(ID = levels(ID)) %>%
  select(-value)

# A tibble: 14 x 2
#   ID    name 
#   <chr> <chr>
# 1 A     X1   
# 2 A     X3   
# 3 A     X5   
# 4 B     X1   
# 5 C     X4   
# 6 C     X5   
# 7 D     X7   
# 8 E     X2   
# 9 F     NA   
#10 G     NA   
#11 H     X3   
#12 I     NA   
#13 J     X1   
#14 J     X4   

In base R, we can use which with merge

mat <- which(df == "Yes", arr.ind = TRUE)
temp <- data.frame(ID = df$ID[mat[, 1]], Code = names(df)[mat[, 2]])
merge(data.frame(ID = levels(df$ID)), temp, all.x = TRUE)

Make sure that df1$ID is of factor class.

Upvotes: 4

Sotos
Sotos

Reputation: 51582

An option via base R can be to use stack, i.e.

na.omit(cbind.data.frame(df[1], stack(replace(df, df == 'No', NA)[-1])))

which gives,

   ID values ind
1   A    Yes  X1
2   B    Yes  X1
10  J    Yes  X1
15  E    Yes  X2
21  A    Yes  X3
28  H    Yes  X3
33  C    Yes  X4
40  J    Yes  X4
41  A    Yes  X5
43  C    Yes  X5
64  D    Yes  X7

Upvotes: 2

Related Questions