Reputation: 402
I have two vectors of unequal length. One vector is a list of dataframes and the other vector is made up of unique values. How can I use map()
to iterate the vectors over a custom function? Dummy data is as thus:
x2 <- letters
y2 <- list(df1, df2, df3, .....)
# where:
df1 <- data.frame(
alphabet = c(letters[1:2]),
latitude = c(7.302888, 7.302717),
longitude = c(5.143116, 5.143184)
)
df2 <- data.frame(
alphabet = c(letters[3:4]),
latitude = c(4.948605, 4.948610),
longitude = c(8.352012, 8.352157)
)
df3 <- data.frame(
alphabet = c(letters[5:6]),
latitude = c(7.681907, 7.681867),
longitude = c(6.414152, 6.414239 )
)
dist_func <- function(i, j) {
mat = j[j$alphabet==i, c("latitude","longitude")]
mat
}
map(x2, ~ map(y2, function(j) dist_func(.i, j)))
throws up error:
Expected output should be list of dataframes, where each dataframe is as thus:
alphabet latitude longitude
1 a 7.302888 5.143116
any help will be appreciated.
Upvotes: 1
Views: 81
Reputation: 2132
Here's a tidyverse
take with the new dplyr::group_split
:
library(tidyverse)
my_output <- list_flatten(map(y2, \(y) group_split(y, alphabet)))
Output:
> my_output
[[1]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 a 7.30 5.14
[[2]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 b 7.30 5.14
[[3]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 c 4.95 8.35
[[4]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 d 4.95 8.35
[[5]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 e 7.68 6.41
[[6]]
# A tibble: 1 × 3
alphabet latitude longitude
<chr> <dbl> <dbl>
1 f 7.68 6.41
That's it!
PS. About map
and your attempted solution:
If x
and y
had the same length, you could use map2_
. Since that's not the case, you can use one map_
to iterate over x
and another map_
to iterate over y
.
Beware: the result will be a list of the same length as x
with elements of the same length as y
, even if they are just empty dataframes (and it could be messy). This way, you need to control how the output should be aggregated (rows/columns with map_dfc
, map_dfr
or with bind_cols
, bind_rows
after) and what should be kept (inside your custom function or after the iteration).
The code below is close to what you tried to do:
library(tidyverse) # at least `dplyr` and `purrr`
my_function <- function(i, j) j[j$alphabet == i, ] # c("latitude", "longitude") removed
my_output <- map(x2,\(x) map(y2,\(y) my_function(x, y)))
The unpleasant output:
> my_output[6:7] # `6` is the last non-empty and `7` the first empty
[[1]]
[[1]][[1]] # No "f" in `df1`
[1] alphabet latitude longitude
<0 linhas> (ou row.names de comprimento 0)
[[1]][[2]] # No "f" in `df2`
[1] alphabet latitude longitude
<0 linhas> (ou row.names de comprimento 0)
[[1]][[3]] # There's "f" in `df3`
alphabet latitude longitude
2 f 7.681867 6.414239
[[2]] # No "g" in `df1` to `df3`
[[2]][[1]]
[1] alphabet latitude longitude
<0 linhas> (ou row.names de comprimento 0)
[[2]][[2]]
[1] alphabet latitude longitude
<0 linhas> (ou row.names de comprimento 0)
[[2]][[3]]
[1] alphabet latitude longitude
<0 linhas> (ou row.names de comprimento 0)
my_output
is a list with length 26 (a
to z
) and each element length 3 (df1
to df3
).
Upvotes: 0
Reputation: 20435
The idiomatic way to do this in R is to bind the rows into a data frame and then split()
:
do.call(rbind, y2) |>
split(~alphabet)
# $a
# alphabet latitude longitude
# 1 a 7.302888 5.143116
# $b
# alphabet latitude longitude
# 2 b 7.302717 5.143184
# $c
# alphabet latitude longitude
# 3 c 4.948605 8.352012
# $d
# alphabet latitude longitude
# 4 d 4.94861 8.352157
# $e
# alphabet latitude longitude
# 5 e 7.681907 6.414152
# $f
# alphabet latitude longitude
# 6 f 7.681867 6.414239
Upvotes: 1