user3919790
user3919790

Reputation: 557

Mapping values across a dataframe

I have a large dataset. The example below is a much abbreviated version.

There are two dataframes, df1 and df2. I would like to map to each row of df1, a derived value using conditions from df2 with arguments from df1.

Hope the example below makes more sense

year <- rep(1996:1997, each=3)
age_group <- rep(c("20-24","25-29","30-34"),2)
df1 <- as.data.frame(cbind(year,age_group))

df1 is a database with all permutations of year and age group.

df2 <- as.data.frame(rbind(c(111,1997,"20-24"),c(222,1997,"30-34")))
names(df2) <- c("id","year","age.group")

df2 is a database where each row represents an individual at a particular year

I would like to use arguments from df1 conditional on values from df2 and then to map to df1. The arguments are as follows:

each_yr <- map(df1, function(year,age_group) case_when(
as.character(df1$year) == as.character(df2$year) & as.character(df1$age_group)    
== as.character(df2$age.group)~ 0, 
TRUE ~ 1))

The output i get is wrong and shown below

structure(list(year = c(1, 1, 1, 1, 1, 0), age_group = c(1, 1, 
1, 1, 1, 0)), .Names = c("year", "age_group")) 

The output i would ideally like is something like this (dataframe as an example but would be happy as a list)

structure(list(year = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1996", 
"1997"), class = "factor"), age_group = structure(c(1L, 2L, 3L, 
1L, 2L, 3L), .Label = c("20-24", "25-29", "30-34"), class = "factor"), 
v1 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("0", 
"1"), class = "factor"), v2 = structure(c(2L, 2L, 2L, 2L, 
2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("year", 
"age_group", "v1", "v2"), row.names = c(NA, -6L), class = "data.frame")

I have used map before when 'df1' is a vector but in this scenario it is a dataframe where both columns are used as arguments. Can Map handle this?

In df3 the column v1 is the result of conditions based on df1 and df2 and then mapped to df1 for patient '111'. Likewise column v2 is the outcome for patient '222'.

Thanks in advance

Upvotes: 1

Views: 1204

Answers (2)

utubun
utubun

Reputation: 4520

Whithing tidiverse you can do it this way:

library(tidyverse)
#library(dplyr)
#library(tidyr)

df2 %>%
  mutate(tmp = 0) %>%
  spread(id, tmp, fill = 1, sep = "_") %>%
  right_join(df1, by = c("year", "age.group" = "age_group")) %>%
  mutate_at(vars(-c(1, 2)), coalesce, 1)

#   year age.group id_111 id_222
# 1 1996     20-24      1      1
# 2 1996     25-29      1      1
# 3 1996     30-34      1      1
# 4 1997     20-24      0      1
# 5 1997     25-29      1      1
# 6 1997     30-34      1      0

#Warning messages:
# 1: Column `year` joining factors with different levels, coercing to character vector 
# 2: Column `age.group`/`age_group` joining factors with different levels, coercing to 
#    character vector 

Upvotes: 0

Nicolas2
Nicolas2

Reputation: 2210

Looks like some work for pmap instead. And a touch of tidyr to get the suggested result.

purrr::pmap(list(df2$id,as.character(df2$year),as.character(df2$age.group)),
  function(id,x,y)
   data.frame(df1,
              key=paste0("v",id),
              value=1-as.integer((x==df1$year)&(y==df1$age_group)),
              stringsAsFactors=FALSE
          )) %>%
   replyr::replyr_bind_rows() %>% tidyr::spread(key,value)

#  year age_group v1 v2
#1 1996     20-24  1  1
#2 1996     25-29  1  1
#3 1996     30-34  1  1
#4 1997     20-24  0  1
#5 1997     25-29  1  1
#6 1997     30-34  1  0

Upvotes: 0

Related Questions