Mapping values across a dataframe

Question

I have a large dataset. The example below is a much abbreviated version.

There are two dataframes, df1 and df2. I would like to map to each row of df1, a derived value using conditions from df2 with arguments from df1.

Hope the example below makes more sense

year <- rep(1996:1997, each=3)
age_group <- rep(c("20-24","25-29","30-34"),2)
df1 <- as.data.frame(cbind(year,age_group))

df1 is a database with all permutations of year and age group.

df2 <- as.data.frame(rbind(c(111,1997,"20-24"),c(222,1997,"30-34")))
names(df2) <- c("id","year","age.group")

df2 is a database where each row represents an individual at a particular year

I would like to use arguments from df1 conditional on values from df2 and then to map to df1. The arguments are as follows:

each_yr <- map(df1, function(year,age_group) case_when(
as.character(df1$year) == as.character(df2$year) & as.character(df1$age_group)    
== as.character(df2$age.group)~ 0, 
TRUE ~ 1))

The output i get is wrong and shown below

structure(list(year = c(1, 1, 1, 1, 1, 0), age_group = c(1, 1, 
1, 1, 1, 0)), .Names = c("year", "age_group"))

The output i would ideally like is something like this (dataframe as an example but would be happy as a list)

structure(list(year = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1996", 
"1997"), class = "factor"), age_group = structure(c(1L, 2L, 3L, 
1L, 2L, 3L), .Label = c("20-24", "25-29", "30-34"), class = "factor"), 
v1 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("0", 
"1"), class = "factor"), v2 = structure(c(2L, 2L, 2L, 2L, 
2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("year", 
"age_group", "v1", "v2"), row.names = c(NA, -6L), class = "data.frame")

I have used map before when 'df1' is a vector but in this scenario it is a dataframe where both columns are used as arguments. Can Map handle this?

In df3 the column v1 is the result of conditions based on df1 and df2 and then mapped to df1 for patient '111'. Likewise column v2 is the outcome for patient '222'.

Thanks in advance

Nicolas2 · Accepted Answer

Looks like some work for pmap instead. And a touch of tidyr to get the suggested result.

purrr::pmap(list(df2$id,as.character(df2$year),as.character(df2$age.group)),
  function(id,x,y)
   data.frame(df1,
              key=paste0("v",id),
              value=1-as.integer((x==df1$year)&(y==df1$age_group)),
              stringsAsFactors=FALSE
          )) %>%
   replyr::replyr_bind_rows() %>% tidyr::spread(key,value)

#  year age_group v1 v2
#1 1996     20-24  1  1
#2 1996     25-29  1  1
#3 1996     30-34  1  1
#4 1997     20-24  0  1
#5 1997     25-29  1  1
#6 1997     30-34  1  0

Mapping values across a dataframe

Answers (2)

Related Questions