Lotte Victor
Lotte Victor

Reputation: 89

Add new column with mutate() from a second dataframe with index given in the first dataframe

I have one dataframe, that contains my results and another dataframe, that contains e.g. only values. Now I want to add a new column to the first dataframe that has the data from the second dataframe. However, the second dataframe does not have a tidy format or the same rows as the first one. However, the position of the value I want to get from the second dataframe is given in two columns of the first dataframe.

library(tidyverse)

df1<-data.frame(Row_no=c(1,2,3,4, 1,2,3,4), Col_no=c(1,1,2,2,3,3,4,4), > Size=c(sample(200:300, 8)))
>df1
  Row_no Col_no Size
1      1      1  226
2      2      1  208
3      3      2  297
4      4      2  211
5      1      3  209
6      2      3  296
7      3      4  273
8      4      4  261

df2=cbind(rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8))


> df2
           [,1]       [,2]       [,3]        [,4]        [,5]       [,6]        [,7]       [,8]
[1,]  *1.4568994* -0.3324945 *-0.2885171* -0.79393545 -0.02439371  1.4216918  0.07288639 -0.2441228
[2,]  *0.3648703*  0.7494033 *-0.9974556* -0.33820023 -0.30235757  1.5094486 -0.10982881  1.9349127
[3,]  0.5044991  *1.2208453* -0.8748034 *-0.86325341*  0.10462120 -0.3674390 -0.04107733  1.1815123
[4,] -1.2792906  *0.7408320* -0.2711479 *-0.07350530* -0.92132461 -0.7753123  0.99841815  1.5802167
[5,] -0.8801507  0.2580448  0.3099108  0.66716720 -0.01144132 -0.9353671  0.44608715 -0.6729589
[6,]  0.4809844  0.6349390  1.9900160  0.62358533  0.35075449  2.4124712 -1.45171943  0.4409148
[7,] -0.5146914  0.9115070 -0.3971806 -0.06477066  0.46028331  0.7067722 -0.44562194  1.9545829
[8,] -0.4299626  1.8211741  0.3272991  0.06177976  1.25383361 -0.7770162 -0.49841279  0.5098795

The desired result would be something like the following (I put asteriks around the values in df2, to show which I wanted):

  Row_no Col_no Size Value
1      1      1  226 1.4568994
2      2      1  208 0.3648703
3      3      2  297 1.2208453
4      4      2  211 0.7408320
5      1      3  209 -0.2885171
6      2      3  296 -0.9974556
7      3      4  273 -0.86325341
8      4      4  261 -0.07350530

However, when I try to run the code

df1%>%
 mutate(value=df2[Row_no, Col_no])

I get the error message,

`Fehler: Column `value` must be length 8 (the number of rows) or one, not 64

Which would be expected. However, when I try to index the columns themselves I get

df1%>%
 mutate(value=df2[Row_no[1], Col_no[1]])

Row_no Col_no Size    value
1      1      1  226 1.456899
2      2      1  208 1.456899
3      3      2  297 1.456899
4      4      2  211 1.456899
5      1      3  209 1.456899
6      2      3  296 1.456899
7      3      4  273 1.456899
8      4      4  261 1.456899
> df1%>%
+   mutate(value[1]=df2[Row_no[1], Col_no[1]])
Error: Unexpected '=' in:
"df1%>%
  mutate(value[1]="

So how would I get my desired result? I would prefer to have a tidy solution. Also, the given example is just a minimum reproducible example, my real files are really large, that's why I need a clear solution... Thanks!

Upvotes: 0

Views: 720

Answers (4)

Florian
Florian

Reputation: 1258

Using purrr::pmap:

df1$Value <- unlist(pmap(list(df1$Row_no, df1$Col_no, list(df2)), ~ ..3[..1,..2]))

and with piping:

df1 %>% 
  mutate(Value = pmap(list(Row_no, Col_no, list(df2)), ~ ..3[..1,..2]))

Upvotes: 1

Lotte Victor
Lotte Victor

Reputation: 89

Thanks to @Yuriy Barvinchenko, I was able to figure out a solution:

df1%>%
  mutate(value=df2[cbind(Row_no, Col_no)])

> df1%>%
+   mutate(value=df2[cbind(Row_no, Col_no)])
  Row_no Col_no Size      value
1      1      1  226  1.4568994
2      2      1  208  0.3648703
3      3      2  297  1.2208453
4      4      2  211  0.7408320
5      1      3  209 -0.2885171
6      2      3  296 -0.9974556
7      3      4  273 -0.8632534
8      4      4  261 -0.0735053

The important part was the cbind() in the indexing brackets.

Upvotes: 3

Allan Cameron
Allan Cameron

Reputation: 173793

The problem is that when you try mutate(value=df2[Row_no, Col_no]), you are actually generating a square matrix of length(Row_no) * length(Col_no) elements, equivalent to df2[df1$Col_no, df1$Row_no]. When you think about it, this is a stack of the 8 "correct" rows, where the correct columns are numbered 1 to 8. The correct elements can therefore be found at [1, 1], [2, 2], [3, 3]...[n, n], i.e. the diagonal of the matrix. The tidiest way to get these into a single column is to multiply it by the identity matrix and take the row sums.

I have replicated your random data here to give a complete solution that matches your example.

library(tidyverse)

df1 <- data.frame(Row_no = rep(1:4, 2),
                  Col_no = rep(1:4, each = 2),
                  Size   = c(sample(200:300, 8)))


df2 <- cbind(c( 1.4568994,  -0.3324945, -0.2885171,  -0.79393545,
               -0.02439371,  1.4216918,  0.07288639, -0.2441228),
             c( 0.3648703,   0.7494033, -0.9974556,  -0.33820023,
               -0.30235757,  1.5094486, -0.10982881,  1.9349127),
             c( 0.5044991,   1.2208453, -0.8748034,  -0.86325341,
                0.10462120, -0.3674390, -0.04107733,  1.1815123),
             c(-1.2792906,   0.7408320, -0.2711479,  -0.07350530,
               -0.92132461, -0.7753123,  0.99841815,  1.5802167),
             c(-0.8801507,   0.2580448,  0.3099108,   0.66716720,
               -0.01144132, -0.9353671,  0.44608715, -0.6729589),
             c( 0.4809844,   0.6349390,  1.9900160,   0.62358533,
                0.35075449,  2.4124712, -1.45171943,  0.4409148),
             c(-0.5146914,   0.9115070, -0.3971806,  -0.06477066,
                0.46028331,  0.7067722, -0.44562194,  1.9545829),
             c(-0.4299626,   1.8211741,  0.3272991,   0.06177976,
                1.25383361, -0.7770162, -0.49841279,  0.5098795))

df1 %>% mutate(value = rowSums(df2[Col_no, Row_no] * diag(8))) %>% print

#   Row_no Col_no Size      value
# 1      1      1  267  1.4568994
# 2      2      1  283  0.3648703
# 3      3      2  259  1.2208453
# 4      4      2  235  0.7408320
# 5      1      3  212 -0.2885171
# 6      2      3  263 -0.9974556
# 7      3      4  251 -0.8632534
# 8      4      4  200 -0.0735053

Upvotes: 0

Yuriy Barvinchenko
Yuriy Barvinchenko

Reputation: 1595

based on answer here

df1$value <- with( df1, df2[ cbind(Row_no, Col_no) ] )

Upvotes: 1

Related Questions