Reputation: 11
I have data like this below. I want to convert data format as Want.
> data
ID X1 X2 X3 X4 X5 X6 X7
1 A Yes No Yes No Yes No No
2 B Yes No No No No No No
3 C No No No Yes Yes No No
4 D No No No No No No Yes
5 E No Yes No No No No No
6 F No No No No No No No
7 G No No No No No No No
8 H No No Yes No No No No
9 I No No No No No No No
10 J Yes No No Yes No No No
Want :
ID CODE
A X1
A X3
A X5
B X1
C X4
C X5
D X7
E X2
F NA
G NA
H X3
I NA
J X1
J X4
I've tried many ways but couldn't solve it. Thanks for help
Upvotes: 0
Views: 89
Reputation: 101247
Here is a base R solution using apply()
code <- apply(data, 1, function(x) {ifelse(sum(x =="Yes") == 0,r <- NA,r <- names(df)[which(x=="Yes")]);r})
id <- rep(data$ID,lengths(code))
dfout <- data.frame(ID = id, CODE = unlist(code))
such that
> dfout
ID CODE
1 A X1
2 A X3
3 A X5
4 B X1
5 C X4
6 C X5
7 D X7
8 E X2
9 F <NA>
10 G <NA>
11 H X3
12 I <NA>
13 J X1
14 J X4
Upvotes: 1
Reputation: 388907
One way in tidyverse
is to get the data in long format, select rows with value == 'Yes'
and add missing levels with complete
.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -ID) %>%
filter(value == "Yes") %>%
complete(ID = levels(ID)) %>%
select(-value)
# A tibble: 14 x 2
# ID name
# <chr> <chr>
# 1 A X1
# 2 A X3
# 3 A X5
# 4 B X1
# 5 C X4
# 6 C X5
# 7 D X7
# 8 E X2
# 9 F NA
#10 G NA
#11 H X3
#12 I NA
#13 J X1
#14 J X4
In base R, we can use which
with merge
mat <- which(df == "Yes", arr.ind = TRUE)
temp <- data.frame(ID = df$ID[mat[, 1]], Code = names(df)[mat[, 2]])
merge(data.frame(ID = levels(df$ID)), temp, all.x = TRUE)
Make sure that df1$ID
is of factor class.
Upvotes: 4
Reputation: 51582
An option via base R can be to use stack
, i.e.
na.omit(cbind.data.frame(df[1], stack(replace(df, df == 'No', NA)[-1])))
which gives,
ID values ind 1 A Yes X1 2 B Yes X1 10 J Yes X1 15 E Yes X2 21 A Yes X3 28 H Yes X3 33 C Yes X4 40 J Yes X4 41 A Yes X5 43 C Yes X5 64 D Yes X7
Upvotes: 2