Reputation: 401
I have about 100 dataset pairs that need to be merged into single data sets, I have looked at posts that show how to merge many data sets together at once (e.g., here and here), but my issue is unique. My real-world data are stored on my hard drive and are similarly named (e.g., household2010
, household2011
, household2012
and person2010
, person2011
, person2012
. They don't need to be loaded into the global environment.Example below.
Working data:
library(tidyverse)
household2010 <- tribble(
~id, ~var2, ~var3, ~var4, ~var5,
"1", "1", "1", "a", "d",
"2", "2", "2", "b", "e",
"3", "3", "3", "c", "f"
)
person2010 <- tribble(
~id, ~var6, ~var7,
"1", "1", "1",
"2", "2", "2",
"3", "3", "3",
"4", "4", "4"
)
household2011 <- tribble(
~id, ~var8, ~var9, ~var10,
"1", "1", "1", "1",
"2", "2", "2", "2",
"3", "3", "3", "3",
"4", "4", "4", "4"
)
person2011 <- tribble(
~id, ~var11, ~var12, ~var13,
"1", "1", "1", "1",
"2", "2", "2", "2",
"3", "3", "3", "3",
"4", "4", "4", "4",
"5", "5", "5", "5"
)
I need to merge household2010
with person2010
and create a new dataset called hhperson2010
. I need to do this to household2011
and person2011
too. Individually I could do:
hhperson2010 <- left_join(household2010, person2010, by = "id")
hhperson2011 <- left_join(household2011, person2011, by = "id")
This gets clunky when I have over a 100 data pairs. Can I use lapply
to have it go through a list of the data sets and merge? Something like:
dflist1 <- list(household2010, household2011)
dflist2 <- list(person2011, person2011)
lapply(function(x) left_join(dflist, dflist2, by = "id")
Upvotes: 0
Views: 130
Reputation: 438
just an alternate solution:
years <- c("2010", "2011", "2014")
for (x in years){
result <- merge(get(paste0("household", x)), get(paste0("person", x)), "id")
names <- paste0("household", x)
print(names)
print(result)
}
you can make a choice between loop or lapply depending on your further processing if any. In case, you don't have any more to do with your dataset, I think lapply would just solve the purpose.
Upvotes: 0
Reputation: 3055
Here's how I would do it with tidyverse
and list-columns
library(dplyr)
library(tidyr)
library(purrr)
env2listcol <- function(rdata_file)
{
e <- new.env()
load(rdata_file, envir = e)
# since you know that there's only 1 df in each environment
as.list.environment(e)[[1]]
}
# assuming files are stored in `input` folder
dir("input", full.names = T) %>% as_tibble() %>%
# split the path
separate(value, into=c("dir", "file", "ext"), remove=FALSE) %>%
# get the category and the key in separate columns
extract(file, into=c("key", "year"), regex="([a-z]+)(\\d+)") %>%
# file path by category by year, remove unnecessary columns
spread(key, value) %>% select(-dir, -ext) %>%
# extract dataframes from environments, and join them
mutate(household=map(household, env2listcol),
person=map(person, env2listcol),
joined=map2(household, person, left_join)) %>%
# rbind joined tables, although you could pull(joined) or imap over it
unnest(joined)
#> # A tibble: 7 x 14
#> year id var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2010 1 1 1 a d 1 1 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 2010 2 2 2 b e 2 2 <NA> <NA> <NA> <NA> <NA> <NA>
#> 3 2010 3 3 3 c f 3 3 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 2011 1 <NA> <NA> <NA> <NA> <NA> <NA> 1 1 1 1 1 1
#> 5 2011 2 <NA> <NA> <NA> <NA> <NA> <NA> 2 2 2 2 2 2
#> 6 2011 3 <NA> <NA> <NA> <NA> <NA> <NA> 3 3 3 3 3 3
#> 7 2011 4 <NA> <NA> <NA> <NA> <NA> <NA> 4 4 4 4 4 4
You decide what you want to do with it. You may write it back to R objects (please, please, please... use Rds instead). You may write it back into a single table (I believe it is MUCH easier to handle going forward). You can probably even export it as json.
Upvotes: 0
Reputation: 28369
Maybe something like this:
years <- 2010:2011
result <- lapply(years,
function(x) left_join(get(paste0("household", x)),
get(paste0("person", x)),
"id"))
names(result) <- paste0("household", years)
Upvotes: 1