Reputation: 557
I have a large dataset. The example below is a much abbreviated version.
There are two dataframes, df1 and df2. I would like to map to each row of df1, a derived value using conditions from df2 with arguments from df1.
Hope the example below makes more sense
year <- rep(1996:1997, each=3)
age_group <- rep(c("20-24","25-29","30-34"),2)
df1 <- as.data.frame(cbind(year,age_group))
df1 is a database with all permutations of year and age group.
df2 <- as.data.frame(rbind(c(111,1997,"20-24"),c(222,1997,"30-34")))
names(df2) <- c("id","year","age.group")
df2 is a database where each row represents an individual at a particular year
I would like to use arguments from df1 conditional on values from df2 and then to map to df1. The arguments are as follows:
each_yr <- map(df1, function(year,age_group) case_when(
as.character(df1$year) == as.character(df2$year) & as.character(df1$age_group)
== as.character(df2$age.group)~ 0,
TRUE ~ 1))
The output i get is wrong and shown below
structure(list(year = c(1, 1, 1, 1, 1, 0), age_group = c(1, 1,
1, 1, 1, 0)), .Names = c("year", "age_group"))
The output i would ideally like is something like this (dataframe as an example but would be happy as a list)
structure(list(year = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1996",
"1997"), class = "factor"), age_group = structure(c(1L, 2L, 3L,
1L, 2L, 3L), .Label = c("20-24", "25-29", "30-34"), class = "factor"),
v1 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("0",
"1"), class = "factor"), v2 = structure(c(2L, 2L, 2L, 2L,
2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("year",
"age_group", "v1", "v2"), row.names = c(NA, -6L), class = "data.frame")
I have used map before when 'df1' is a vector but in this scenario it is a dataframe where both columns are used as arguments. Can Map handle this?
In df3 the column v1 is the result of conditions based on df1 and df2 and then mapped to df1 for patient '111'. Likewise column v2 is the outcome for patient '222'.
Thanks in advance
Upvotes: 1
Views: 1204
Reputation: 4520
Whithing tidiverse
you can do it this way:
library(tidyverse)
#library(dplyr)
#library(tidyr)
df2 %>%
mutate(tmp = 0) %>%
spread(id, tmp, fill = 1, sep = "_") %>%
right_join(df1, by = c("year", "age.group" = "age_group")) %>%
mutate_at(vars(-c(1, 2)), coalesce, 1)
# year age.group id_111 id_222
# 1 1996 20-24 1 1
# 2 1996 25-29 1 1
# 3 1996 30-34 1 1
# 4 1997 20-24 0 1
# 5 1997 25-29 1 1
# 6 1997 30-34 1 0
#Warning messages:
# 1: Column `year` joining factors with different levels, coercing to character vector
# 2: Column `age.group`/`age_group` joining factors with different levels, coercing to
# character vector
Upvotes: 0
Reputation: 2210
Looks like some work for pmap instead. And a touch of tidyr to get the suggested result.
purrr::pmap(list(df2$id,as.character(df2$year),as.character(df2$age.group)),
function(id,x,y)
data.frame(df1,
key=paste0("v",id),
value=1-as.integer((x==df1$year)&(y==df1$age_group)),
stringsAsFactors=FALSE
)) %>%
replyr::replyr_bind_rows() %>% tidyr::spread(key,value)
# year age_group v1 v2
#1 1996 20-24 1 1
#2 1996 25-29 1 1
#3 1996 30-34 1 1
#4 1997 20-24 0 1
#5 1997 25-29 1 1
#6 1997 30-34 1 0
Upvotes: 0