Adding column based on the matching value in second dataframe

Question

Hello all,

I'm trying to add a new column next to my REP column based on the condition of the second data frame. for example, if 1 from Block1 is present in GEN and its REP is 1 then in the new column add 1, I can do it with case_when as below

df<-df%>%mutate(Block = case_when(
  GEN=="G132" & REP=="R1" ~ "1",
  GEN=="G100" & REP=="R1" ~ "1",
  GEN=="G120" & REP=="R1" ~ "1",
  GEN=="G58" & REP=="R1" ~ "1",
  GEN=="G48" & REP=="R1" ~ "1",
  GEN=="G125" & REP=="R1" ~ "1",
  GEN=="G1" & REP=="R1" ~ "1",
  GEN=="G29" & REP=="R1" ~ "1",
  GEN=="G42" & REP=="R1" ~ "1",
  TRUE~GEN
))

I was wondering if I could loop it somehow, I have 144 GEN replicated twice and they fall into 32 blocks

Thank you

Jost · Accepted Answer

from how you describe your problem I would go about it in a different way, than what you have illustrated in your example. To avoid having to make a case when for each column in your block; if you convert your "block" data frame into long format and then simply merge the matching gens, then afterwards adjust the block values to you conditions.

library(plyr)
library(dplyr)
library(tidyr) #for wide to long format

#first data frame
df <- data.frame(GEN = c("G1", "G1", "G2", "G2", "G3", "G3",  "G4", "G4",  "G5", "G5"), 
                 REP = rep(c("R1", "R2"), 5))

#second data frame
df2 <- data.frame(Block1 = c(132,100,120,58,48,125,1,29,142),
                  Block2 = c(2,107,113,89,87,75,38,70,81),
                  Block3 = c(136,134,53,106,143,63,22,40,56),
                  Block4 = c(112,144,122,32,39,50,130,74,13))

#take the blocks data frame from wide to long format using tidyr library
blocks <- df2 %>% pivot_longer(cols=names(df2),
                               names_to='Block',
                               values_to='GEN')

blocks$GEN <- paste0("G", blocks$GEN) #add a G before the number so that values can be matched
blocks$Block <- gsub("Block", "", blocks$Block) #remove block from string so that the bumber is left

res <- merge(df, blocks, by = "GEN", all.x = T) #merge the two dataframes.

#here the other condition of "R2" not being a match is established, and no matches is also replaced with the GEN column value
res[is.na(res$Block) | res$REP == "R2", "Block"] <- res[is.na(res$Block) | res$REP == "R2", "GEN"]

Hope that it makes sense, and that i have understood the desired solution correctly!

Adding column based on the matching value in second dataframe

Answers (1)

Related Questions