Reputation: 191
I am moving a task from Excel to R for reproducibility and the ability to handle datafiles with more records than the limits of Excel. The conversion is going well, but I hit a roadblock due to my limited R skills (and that of my on-site colleagues!)
The problem
Subject dataframe
ID Var1 Var2
1 0.4 0.3
2 0.4 0.1
3 0.2 0.2
4 0.3 0.7
TypeList dataframe
ID Type
1 A
1 B
1 C
2 C
3 B
3 A
4 C
What I would like to do is add a TypeA, TypeB, TypeC column from the TypeList dataframe to the Subject dataframe like this:
ID Var1 Var2 TypeA TypeB TypeC
1 0.4 0.3 TRUE TRUE TRUE
2 0.4 0.1 FALSE FALSE TRUE
3 0.2 0.2 TRUE TRUE FALSE
4 0.3 0.7 FALSE FALSE TRUE
This is a simplified example, the actual Subject dataframe has some 2000 entries, the TypeList has about 70k entries for some 4000 types. Of those types, the interest is in 10 of them at this time.
Any suggestions would be greatly appreciated.
Upvotes: 1
Views: 42
Reputation: 887901
Here is an option with data.table
. dcast
the 'TypeList' dataset to 'wide' format and join with 'Subject' on the 'ID' column
library(data.table)
setDT(Subject)[dcast(setDT(TypeList), ID ~ paste0("Type", Type),
function(x) as.logical(length(x))), on = .(ID)]
# ID Var1 Var2 TypeA TypeB TypeC
#1: 1 0.4 0.3 TRUE TRUE TRUE
#2: 2 0.4 0.1 FALSE FALSE TRUE
#3: 3 0.2 0.2 TRUE TRUE FALSE
#4: 4 0.3 0.7 FALSE FALSE TRUE
Or using merge
from base R
merge(`row.names<-`(Subject, Subject$ID), table(TypeList) > 0, by = "row.names")[-1]
# ID Var1 Var2 A B C
#1 1 0.4 0.3 TRUE TRUE TRUE
#2 2 0.4 0.1 FALSE FALSE TRUE
#3 3 0.2 0.2 TRUE TRUE FALSE
#4 4 0.3 0.7 FALSE FALSE TRUE
Upvotes: 2
Reputation: 14774
One way of going about it:
library(tidyverse)
df2 <- df2 %>% mutate(spread_var = TRUE, Type = paste0("Type", Type)) %>%
spread(Type, spread_var) %>% replace(., is.na(.), FALSE)
df1 <- left_join(df1, df2)
df1
ID Var1 Var2 TypeA TypeB TypeC
1 1 0.4 0.3 TRUE TRUE TRUE
2 2 0.4 0.1 FALSE FALSE TRUE
3 3 0.2 0.2 TRUE TRUE FALSE
4 4 0.3 0.7 FALSE FALSE TRUE
Upvotes: 2