drastega
drastega

Reputation: 1773

How to select from data frame using column orders stored in two another data frames in R?

I have three data frames with equal number of rows. The snippet is

df1 <- data.frame(Id = c(12345, 12346, 12347), X1 = c('3', '2', '1'), X2 = c('1,2', '1,3', '1'))
df1
     Id X1  X2
1 12345  3 1,2
2 12346  2 1,3
3 12347  1   1

df2 <- data.frame(Id = c(12345, 12346, 12347), X1_1 = c(3, 2, 1), X1_2 = c(1, 1, 2), X1_3 = c(2, 3, 3), X2_1 = c(1, 1, 1), X2_2 = c(2, 3, 3), X2_3 = c(3, 2, 2))
df2
     Id X1_1 X1_2 X1_3 X2_1 X2_2 X2_3
1 12345    3    1    2    1    2    3
2 12346    2    1    3    1    3    2
3 12347    1    2    3    1    3    2

df3 <- data.frame(Id = c(12345, 12346, 12347), X1 = c(1, 2, 1), X2 = c(2, 1, 2))
df3
     Id X1 X2
1 12345  1  2
2 12346  2  1
3 12347  1  2

df1 stores column numbers of df2 from which I need to take elements. df1$X1 is df2$X1_-th column in the subset X1_... of df2. df1$X2 is df2$X2_-th columns in the subset X2_... of df2 and so on. Taking the first row in my example: df$X1 = 3 so I need to take element from df2$X1_3 (3d column). This element is 2. Then df1$X2 = 1,2 and I need to take two elements, first from df2$X2_1 and second from df2$X2_2. They are 1 and 2. All obtained elements for the first row I need to store as single vector in the first element of desired list and so on for every row.

It's like question with two data frames but now df3 stores the final order of elements, so I need to get a list

[[1]]
[1] 2 1 2

[[2]]
[1] 1 2 1

[[3]]
[1] 1 1

What is the elegant way to create this list of elements in R?

UPD: All three data frames have NA. After replacing NA's by zeroes the first rows of real data frames could look like

1)

    df1[1, ]
         Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
1 123450744  1  5  1  3  2   0   0   0   3  6 1,2,4,6,7  1  0   0   0   0   5

    df2[1, ]
         Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7
1 123450744    1    2    4    2    5    7    3    6    1    8    9    1    7   10    6   11    5    8    9    4     3     2    12   11    1    6    4    2    5    8
  X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5
1   10    7     3     9    12    4    2    5    1    3    6    1    2    3    4    5    6    7    1    2    3    1    2    0    0    0     0     0     0     0     0
  X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10
1     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0      0     0     0     0     0     0     0     0     0     0      0
  X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2
1      0      0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0     0     0
  X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
1     3     5     4     2     6     1     7     1     2     3     4     5

    df3[1, ]
         Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
1 123450744  1  2  3  4  5  6  7  8  0   0   0   0   0   0   0   0   0   9  10

or

2)

df1  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
 1 123450744  1  1  4  3  1   2   1   2   4 1,2,4,5,6,7,8  2 1,2,3,6,7 1,2,3   7 1,3,4 4,5,7,8,9   1")
 df2  <- read.table(header=TRUE, as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
 1 123450744  1    2    2    7    8    3    1    5    6    4    9    5    6    8   10    7    3    1    9    2    11     4    12    5    8    1    3    6    9    4    7    2    10    11    12    3    5    4    1    2    6    1    2    3    4    5    6    7    1    2    3    1    2    1    2    3     3     4     2     8     5    1     7     6    10     3     6    12     7     9     8     4     5      1      2     11     13     6     7    10     9     4     3     5     2     1     11    8     12     1     2     3     4     5     1     2     3     4     5     7     4     2     6     3     5     1    10     8      9     11     12     1     2   5     4     1     2     6     3     7     1     2     3     4     5")
 df3  <- read.table(header=TRUE, as.is = TRUE, text="d X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
 1 123450744  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19")
 df1[,-1]  <- as.character(df1[,-1])

Upvotes: 0

Views: 102

Answers (1)

GKi
GKi

Reputation: 39687

Based on the Answer to your previous question this could be done like:

df1[,-1] <- lapply(df1[,-1], as.character)
df3[df3==0]  <- NA

lapply(setNames(df1$Id, df1$Id), function(Id) {
  x  <- unlist(df3[df3$Id==Id,-1])
  x <- names(sort(x[!is.na(x)])) #When NA is indicating that this column should not be used
  x  <- x[x %in% colnames(df1)] #This is only needed when df3 has colnames which are not in df1
  x  <- unlist(sapply(seq_along(x), function(j) {paste0(x[j], "_", strsplit(df1[df1$Id==Id, x[j]], ",")[[1]])}))
  df2[df2$Id==Id, x]
})
#For the original Question
#$`12345`
#  X1_3 X2_1 X2_2
#1    2    1    2
#
#$`12346`
#  X2_1 X2_3 X1_2
#2    1    2    1
#
#$`12347`
#  X1_1 X2_1
#3    1    1
#
#For UPD 1)
#$`123450744`
#  X1_1 X2_6 X3_1 X3_2 X3_4 X3_6 X3_7 X4_1 X5_5 X6_1 X7_3 X8_2 X18_5 X19_3
#1    1    6    1    7    6    5    8   11    3    1    3    2     6     3
#
#For UPD 2)
#$`123450744`
#  X1_1 X2_1 X2_2 X2_4 X2_5 X2_6 X2_7 X2_8 X3_2 X4_1 X4_2 X4_3 X4_6 X4_7 X5_1 X6_4 X7_3 X8_1 X9_1 X9_2 X9_3 X11_7 X12_1 X12_3 X12_4 X13_2 X14_1 X15_4 X15_5 X15_7 X15_8 X15_9 X16_2 X18_1 X19_4
#1    1    2    7    3    1    5    6    4    6    5    8    1    9    4    3    4    3    1    1    2    3     8     6    10     9     2     1     6     3     1    10     8     2     5     4

Data:

#Original
df1 <- data.frame(Id = c(12345, 12346, 12347), X1 = c('3', '2', '1'), X2 = c('1,2', '1,3', '1'))
df2 <- data.frame(Id = c(12345, 12346, 12347), X1_1 = c(3, 2, 1), X1_2 = c(1, 1, 2), X1_3 = c(2, 3, 3), X2_1 = c(1, 1, 1), X2_2 = c(2, 3, 3), X2_3 = c(3, 2, 2))
df3 <- data.frame(Id = c(12345, 12346, 12347), X1 = c(1, 2, 1), X2 = c(2, 1, 2))

#UPD 1)
df1  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
1 123450744  1  5  1  3  2   0   0   0   3  6 1,2,4,6,7  1  0   0   0   0   5")
df2  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
1 123450744    1    2    4    2    5    7    3    6    1    8    9    1    7   10    6   11    5    8    9    4     3     2    12   11    1    6    4    2    5    8   10    7     3     9    12    4    2    5    1    3    6    1    2    3    4    5    6    7    1    2    3    1    2    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0      0     0     0     0     0     0     0     0     0     0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0      0      0      0     0     0     3     5     4     2     6     1     7     1     2     3     4     5")
df3  <- read.table(header=TRUE, , as.is = TRUE, text="Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
1 123450744  1  2  3  4  5  6  7  8  0   0   0   0   0   0   0   0   0   9  10")

#UPD 2)
df1  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X5 X6 X7 X8 X13 X14 X16 X19 X2        X3 X4 X9 X11 X12 X15 X18
 1 123450744  1  1  4  3  1   2   1   2   4 1,2,4,5,6,7,8  2 1,2,3,6,7 1,2,3   7 1,3,4 4,5,7,8,9   1")
df2  <- read.table(header=TRUE, as.is = TRUE, text="Id X1_1 X1_2 X2_1 X2_2 X2_3 X2_4 X2_5 X2_6 X2_7 X2_8 X2_9 X3_1 X3_2 X3_3 X3_4 X3_5 X3_6 X3_7 X3_8 X3_9 X3_10 X3_11 X3_12 X4_1 X4_2 X4_3 X4_4 X4_5 X4_6 X4_7 X4_8 X4_9 X4_10 X4_11 X4_12 X5_1 X5_2 X5_3 X5_4 X5_5 X5_6 X6_1 X6_2 X6_3 X6_4 X6_5 X6_6 X6_7 X7_1 X7_2 X7_3 X8_1 X8_2 X9_1 X9_2 X9_3 X10_1 X10_2 X10_3 X10_4 X10_5 X10_6 X10_7 X10_8 X11_1 X11_2 X11_3 X11_4 X11_5 X11_6 X11_7 X11_8 X11_9 X11_10 X11_11 X11_12 X11_13 X12_1 X12_2 X12_3 X12_4 X12_5 X12_6 X12_7 X12_8 X12_9 X12_10 X12_11 X12_12 X13_1 X13_2 X13_3 X13_4 X13_5 X14_1 X14_2 X14_3 X14_4 X14_5 X15_1 X15_2 X15_3 X15_4 X15_5 X15_6 X15_7 X15_8 X15_9 X15_10 X15_11 X15_12 X16_1 X16_2 X18_1 X18_2 X18_3 X18_4 X18_5 X18_6 X18_7 X19_1 X19_2 X19_3 X19_4 X19_5
 1 123450744  1    2    2    7    8    3    1    5    6    4    9    5    6    8   10    7    3    1    9    2    11     4    12    5    8    1    3    6    9    4    7    2    10    11    12    3    5    4    1    2    6    1    2    3    4    5    6    7    1    2    3    1    2    1    2    3     3     4     2     8     5    1     7     6    10     3     6    12     7     9     8     4     5      1      2     11     13     6     7    10     9     4     3     5     2     1     11    8     12     1     2     3     4     5     1     2     3     4     5     7     4     2     6     3     5     1    10     8      9     11     12     1     2   5     4     1     2     6     3     7     1     2     3     4     5")
df3  <- read.table(header=TRUE, as.is = TRUE, text="Id X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
 1 123450744  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19")

Upvotes: 1

Related Questions