RROBINSON
RROBINSON

Reputation: 191

R - Add table colume to reflect the presence of a value in another table

I am moving a task from Excel to R for reproducibility and the ability to handle datafiles with more records than the limits of Excel. The conversion is going well, but I hit a roadblock due to my limited R skills (and that of my on-site colleagues!)

The problem

Subject dataframe

ID    Var1   Var2  
1     0.4    0.3
2     0.4    0.1
3     0.2    0.2
4     0.3    0.7

TypeList dataframe

ID  Type
1    A
1    B
1    C
2    C
3    B
3    A
4    C

What I would like to do is add a TypeA, TypeB, TypeC column from the TypeList dataframe to the Subject dataframe like this:

ID    Var1   Var2     TypeA   TypeB   TypeC
    1     0.4    0.3   TRUE   TRUE    TRUE
    2     0.4    0.1   FALSE  FALSE   TRUE
    3     0.2    0.2   TRUE   TRUE    FALSE
    4     0.3    0.7   FALSE  FALSE   TRUE

This is a simplified example, the actual Subject dataframe has some 2000 entries, the TypeList has about 70k entries for some 4000 types. Of those types, the interest is in 10 of them at this time.

Any suggestions would be greatly appreciated.

Upvotes: 1

Views: 42

Answers (2)

akrun
akrun

Reputation: 887901

Here is an option with data.table. dcast the 'TypeList' dataset to 'wide' format and join with 'Subject' on the 'ID' column

library(data.table)
setDT(Subject)[dcast(setDT(TypeList), ID ~ paste0("Type", Type), 
           function(x) as.logical(length(x))), on = .(ID)]
#   ID Var1 Var2 TypeA TypeB TypeC
#1:  1  0.4  0.3  TRUE  TRUE  TRUE
#2:  2  0.4  0.1 FALSE FALSE  TRUE
#3:  3  0.2  0.2  TRUE  TRUE FALSE
#4:  4  0.3  0.7 FALSE FALSE  TRUE

Or using merge from base R

merge(`row.names<-`(Subject, Subject$ID), table(TypeList) > 0, by = "row.names")[-1]
#  ID Var1 Var2     A     B     C
#1  1  0.4  0.3  TRUE  TRUE  TRUE
#2  2  0.4  0.1 FALSE FALSE  TRUE
#3  3  0.2  0.2  TRUE  TRUE FALSE
#4  4  0.3  0.7 FALSE FALSE  TRUE

Upvotes: 2

arg0naut91
arg0naut91

Reputation: 14774

One way of going about it:

library(tidyverse)

df2 <- df2 %>% mutate(spread_var = TRUE, Type = paste0("Type", Type)) %>% 
               spread(Type, spread_var) %>% replace(., is.na(.), FALSE)

df1 <- left_join(df1, df2)

df1

  ID Var1 Var2 TypeA TypeB TypeC
1  1  0.4  0.3  TRUE  TRUE  TRUE
2  2  0.4  0.1 FALSE FALSE  TRUE
3  3  0.2  0.2  TRUE  TRUE FALSE
4  4  0.3  0.7 FALSE FALSE  TRUE

Upvotes: 2

Related Questions