Reputation: 89
I have one dataframe, that contains my results and another dataframe, that contains e.g. only values. Now I want to add a new column to the first dataframe that has the data from the second dataframe. However, the second dataframe does not have a tidy format or the same rows as the first one. However, the position of the value I want to get from the second dataframe is given in two columns of the first dataframe.
library(tidyverse)
df1<-data.frame(Row_no=c(1,2,3,4, 1,2,3,4), Col_no=c(1,1,2,2,3,3,4,4), > Size=c(sample(200:300, 8)))
>df1
Row_no Col_no Size
1 1 1 226
2 2 1 208
3 3 2 297
4 4 2 211
5 1 3 209
6 2 3 296
7 3 4 273
8 4 4 261
df2=cbind(rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8), rnorm(8))
> df2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] *1.4568994* -0.3324945 *-0.2885171* -0.79393545 -0.02439371 1.4216918 0.07288639 -0.2441228
[2,] *0.3648703* 0.7494033 *-0.9974556* -0.33820023 -0.30235757 1.5094486 -0.10982881 1.9349127
[3,] 0.5044991 *1.2208453* -0.8748034 *-0.86325341* 0.10462120 -0.3674390 -0.04107733 1.1815123
[4,] -1.2792906 *0.7408320* -0.2711479 *-0.07350530* -0.92132461 -0.7753123 0.99841815 1.5802167
[5,] -0.8801507 0.2580448 0.3099108 0.66716720 -0.01144132 -0.9353671 0.44608715 -0.6729589
[6,] 0.4809844 0.6349390 1.9900160 0.62358533 0.35075449 2.4124712 -1.45171943 0.4409148
[7,] -0.5146914 0.9115070 -0.3971806 -0.06477066 0.46028331 0.7067722 -0.44562194 1.9545829
[8,] -0.4299626 1.8211741 0.3272991 0.06177976 1.25383361 -0.7770162 -0.49841279 0.5098795
The desired result would be something like the following (I put asteriks around the values in df2, to show which I wanted):
Row_no Col_no Size Value
1 1 1 226 1.4568994
2 2 1 208 0.3648703
3 3 2 297 1.2208453
4 4 2 211 0.7408320
5 1 3 209 -0.2885171
6 2 3 296 -0.9974556
7 3 4 273 -0.86325341
8 4 4 261 -0.07350530
However, when I try to run the code
df1%>%
mutate(value=df2[Row_no, Col_no])
I get the error message,
`Fehler: Column `value` must be length 8 (the number of rows) or one, not 64
Which would be expected. However, when I try to index the columns themselves I get
df1%>%
mutate(value=df2[Row_no[1], Col_no[1]])
Row_no Col_no Size value
1 1 1 226 1.456899
2 2 1 208 1.456899
3 3 2 297 1.456899
4 4 2 211 1.456899
5 1 3 209 1.456899
6 2 3 296 1.456899
7 3 4 273 1.456899
8 4 4 261 1.456899
> df1%>%
+ mutate(value[1]=df2[Row_no[1], Col_no[1]])
Error: Unexpected '=' in:
"df1%>%
mutate(value[1]="
So how would I get my desired result? I would prefer to have a tidy solution. Also, the given example is just a minimum reproducible example, my real files are really large, that's why I need a clear solution... Thanks!
Upvotes: 0
Views: 720
Reputation: 1258
Using purrr::pmap:
df1$Value <- unlist(pmap(list(df1$Row_no, df1$Col_no, list(df2)), ~ ..3[..1,..2]))
and with piping:
df1 %>%
mutate(Value = pmap(list(Row_no, Col_no, list(df2)), ~ ..3[..1,..2]))
Upvotes: 1
Reputation: 89
Thanks to @Yuriy Barvinchenko, I was able to figure out a solution:
df1%>%
mutate(value=df2[cbind(Row_no, Col_no)])
> df1%>%
+ mutate(value=df2[cbind(Row_no, Col_no)])
Row_no Col_no Size value
1 1 1 226 1.4568994
2 2 1 208 0.3648703
3 3 2 297 1.2208453
4 4 2 211 0.7408320
5 1 3 209 -0.2885171
6 2 3 296 -0.9974556
7 3 4 273 -0.8632534
8 4 4 261 -0.0735053
The important part was the cbind() in the indexing brackets.
Upvotes: 3
Reputation: 173793
The problem is that when you try mutate(value=df2[Row_no, Col_no])
, you are actually generating a square matrix of length(Row_no) * length(Col_no)
elements, equivalent to df2[df1$Col_no, df1$Row_no]
. When you think about it, this is a stack of the 8 "correct" rows, where the correct columns are numbered 1 to 8. The correct elements can therefore be found at [1, 1], [2, 2], [3, 3]...[n, n], i.e. the diagonal of the matrix. The tidiest way to get these into a single column is to multiply it by the identity matrix and take the row sums.
I have replicated your random data here to give a complete solution that matches your example.
library(tidyverse)
df1 <- data.frame(Row_no = rep(1:4, 2),
Col_no = rep(1:4, each = 2),
Size = c(sample(200:300, 8)))
df2 <- cbind(c( 1.4568994, -0.3324945, -0.2885171, -0.79393545,
-0.02439371, 1.4216918, 0.07288639, -0.2441228),
c( 0.3648703, 0.7494033, -0.9974556, -0.33820023,
-0.30235757, 1.5094486, -0.10982881, 1.9349127),
c( 0.5044991, 1.2208453, -0.8748034, -0.86325341,
0.10462120, -0.3674390, -0.04107733, 1.1815123),
c(-1.2792906, 0.7408320, -0.2711479, -0.07350530,
-0.92132461, -0.7753123, 0.99841815, 1.5802167),
c(-0.8801507, 0.2580448, 0.3099108, 0.66716720,
-0.01144132, -0.9353671, 0.44608715, -0.6729589),
c( 0.4809844, 0.6349390, 1.9900160, 0.62358533,
0.35075449, 2.4124712, -1.45171943, 0.4409148),
c(-0.5146914, 0.9115070, -0.3971806, -0.06477066,
0.46028331, 0.7067722, -0.44562194, 1.9545829),
c(-0.4299626, 1.8211741, 0.3272991, 0.06177976,
1.25383361, -0.7770162, -0.49841279, 0.5098795))
df1 %>% mutate(value = rowSums(df2[Col_no, Row_no] * diag(8))) %>% print
# Row_no Col_no Size value
# 1 1 1 267 1.4568994
# 2 2 1 283 0.3648703
# 3 3 2 259 1.2208453
# 4 4 2 235 0.7408320
# 5 1 3 212 -0.2885171
# 6 2 3 263 -0.9974556
# 7 3 4 251 -0.8632534
# 8 4 4 200 -0.0735053
Upvotes: 0
Reputation: 1595
based on answer here
df1$value <- with( df1, df2[ cbind(Row_no, Col_no) ] )
Upvotes: 1