Reputation: 101
hello everyone I have two dataframes and I'd like to join information from one df to another one in a specific way. I'm gonna explain better. Here is my first df where i'd like to add 6 columns (general col named col1, col2 and so on..):
res1 res4 aa1234
1 AAAAAA 1 4 IVGG
2 AAAAAA 8 11 RPRQ
3 AAAAAA 10 13 RQFP
4 AAAAAA 12 15 FPFL
5 AAAAAA 20 23 NQGR
6 AAAAAA 32 35 HARF
here is the 2nd df:
res1 dist
1 3.711846
1 3.698985
2 4.180874
2 3.112819
3 3.559737
3 3.722107
4 3.842375
4 3.914970
5 3.361647
5 2.982788
6 3.245118
6 3.224230
7 3.538315
7 3.602273
8 3.185184
8 2.771583
9 4.276871
9 3.157737
10 3.933783
10 2.956738
Considering "res1" I'd like to add to the 1st df in my new 6 columns the first 6th values contained in "dist" of second df corresponding to res1 = 1. After, in the 1st df I have res1 = 8, so I'd like to add in the new 6 columns the 6 values from res1 = 8 contained in "dist" of 2nd df.
I'd like to obtain something like this
res1 res4 aa1234 col1 col2 col3 col4 col5 col6
1 4 IVGG 3.71 3.79 4.18 3.11 3.55 3.72
8 11 RPRQ 3.18 2.77 4.27 3.15 3.93 2.95
10 13 RQFP
12 15 FPFL
20 23 NQGR
32 35 HARF
Please consider that I have to do it on a large dataset and for 1000 and more files... thanks!
Upvotes: 2
Views: 95
Reputation: 388862
You could create a sequence from res1
to res4
and then join the data with pdb
.
library(tidyverse)
turn %>%
mutate(res = map2(res1, res4, seq)) %>%
unnest(res) %>%
left_join(pdb, by = c('res' = 'res1')) %>%
group_by(res1 = as.character(res1)) %>%
mutate(col = paste0('col', row_number())) %>%
select(-res4, -res, -eleno) %>%
pivot_wider(names_from = col, values_from = dist)
Upvotes: 1
Reputation: 887008
We can use rowid
from data.table
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df2 %>%
mutate(col = str_c("col", rowid(res1))) %>%
pivot_wider(names_from = col, values_from = dist) %>%
right_join(df1, by = 'res1')
-output
# A tibble: 6 x 4
# res1 col1 col2 res4
# <int> <dbl> <dbl> <int>
#1 1 3.71 3.70 4
#2 8 3.19 2.77 11
#3 10 3.93 2.96 13
#4 12 NA NA 15
#5 20 NA NA 23
#6 32 NA NA 35
df1 <- structure(list(res1 = c(1L, 8L, 10L, 12L, 20L, 32L), res4 = c(4L,
11L, 13L, 15L, 23L, 35L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
df2 <- structure(list(res1 = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), dist = c(3.711846,
3.698985, 4.180874, 3.112819, 3.559737, 3.722107, 3.842375, 3.91497,
3.361647, 2.982788, 3.245118, 3.22423, 3.538315, 3.602273, 3.185184,
2.771583, 4.276871, 3.157737, 3.933783, 2.956738)), class = "data.frame",
row.names = c(NA,
-20L))
Upvotes: 1