Reputation: 19
Suppose I have the following two dataframes (with uneven rows)
set.seed(1999)
dfA <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
set.seed(24)
dfB <- data.frame(a = rpois(10,2), b = rpois(10,2), c = rpois(10,2), d = rpois(10,2), e = rpois(10,2))
set.seed(10)
Dx <- sample.int(5)
set.seed(6)
Dy <- sample.int(5)
Dx <- as.data.frame(Dx)
Dx <- as.data.frame(transpose(Dx))
Dy <- as.data.frame(Dy)
Dy <- as.data.frame(transpose(Dy))
dfAB <- map2_df(dfA, dfB, str_c, sep=",") %>%
rename_all(~ str_c('C', seq_along(.)))
dfXY <- map2_df(Dx, Dy, str_c, sep=",") %>%
rename_all(~ str_c('C', seq_along(.)))
Now I have 2 datasets of coordinates (dfAB 5 variables each with 10 observations, dataset dfXY 5 variables with 1 observation).
What I would like to do is to find the distance between the observation of variable 1 of dfXY and every individual observation in variable 1 of dfAB, the distance between observation 1 of variable 2 of dfXY and every individual observation in variable 2 of dfAB, etc.
dfAB dfXY
3,1 3,2 ... 3,5 1,2 2,1 5,4 4,3
2,1 3,1
2,3 1,2
... ...
i.e. the distance between: a) 3,5 & 3,1 b) 3,5 & 2,1 c) 3,5 & 2,3 etc...
and the distance between: a) 1,2 & 3,2 b) 1,2 & 3,1 c) 1,2 & 1,2 etc..
and so on.
If the datasets had equal amount of observations I could use:
distances <- map2_df(
dfAB,
dfXY,
~ sqrt((.x$x - .y$x)^2 + (.x$y - .y$y)^2)
)
But since dfXY only have 1 observation (to be compared with repeatedly), this does not work. I think I need to use something like a for(i in seq_along())
function but I do not know how to incorporate the ~ sqrt((.x$x - .y$x)^2 + (.x$y - .y$y)^2)
distance <- for(i in seq_along(dfXY)){
dfAB[,i] <- dfAB[,i] [WHAT TO PUT HERE]
Any help is much appreciated
Upvotes: 0
Views: 54
Reputation: 1180
I'm having a bit of a hard time following what you're trying to do here, but I think you may be making things too needlessly complicated for yourself.
For example, instead of nesting map2()
call inside a lapply()
call, I think you can achieve pretty much the same result without iteration using bind_cols()
:
dfA <- tibble(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
dfB <- tibble(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
df_abt <- dfA %>%
bind_cols(dfB) %>%
select(x, x1, y, y1, z, z1, q, q1, t, t1)
For dataframes C and D, you can use iteration with map to avoid having to transpose them:
dfC <- map(1:5, ~ .x) %>% bind_cols()
dfD <- map(11:15, ~.x) %>% bind_cols()
df_cdt <- dfC %>%
bind_cols(dfD) %>%
select(V1, V11, V2, V21, V3, V31, V4, V41, V5, V51)
(actually why not just store df_cdt as a vector? is there a reason it needs to be a data frame?)
As for distances, I reckon this should work:
df_dist <- map2_df(df_abt, df_cdt, ~ sqrt((.x - .y)^2))
If you have an unequal number of rows in df_abt, why not just pad out the missing rows with NA's? I mean, it won't let you build a dataframe with columns of different length anyway.
Upvotes: 1