Matching values between dataframes using ifelse

Question

I am trying to match data between two dataframes, but am getting the value for the position in the vector, rather than the corresponding value.

I have two data.frames:

df1=data.frame(Gene=c("gene1","gene2","gene3","gene4","gene5"),TWAS.testable=c(1,0,1,1,0),stringsAsFactors=FALSE)

    > df1
       Gene TWAS.testable
    1 gene1             1
    2 gene2             0
    3 gene3             1
    4 gene4             1
    5 gene5             0


df2=data.frame(Gene=c("gene1","gene3","gene4","gene7","gene8"),TWAS.Z=c(0.43,3.63,0.11,-0.82,0.36),stringsAsFactors=FALSE)

    > df2
       Gene TWAS.Z
    1 gene1   0.43
    2 gene3   3.63
    3 gene4   0.11
    4 gene7  -0.82
    5 gene8   0.36

I am trying to replace the values in TWAS.testable, with those in TWAS.Z which correspond to the matching Gene, otherwise fill with NA. So that what I get back is:

      Gene TWAS.testable
    1 gene1          0.43
    2 gene2            NA
    3 gene3          3.63
    4 gene4          0.11
    5 gene5            NA

So I tried:

df1$TWAS.testable=ifelse(df1$Gene %in% df2$Gene,df2$TWAS.Z,NA)

which returns

    > df1
      Gene TWAS.testable
    1 gene1          0.43
    2 gene2            NA
    3 gene3          0.11
    4 gene4         -0.82
    5 gene5            NA

so it is returning the position in the vector, rather than matching TWAS.Z to its corresponding Gene. i.e. gene3 is the third object in df1$Gene, so it is filling TWAS.testable with 0.11, the 3rd object from df2$TWAS.Z. When really, I want the df2$TWAS.Z where df1$Gene==df2$Gene.

I can see why this is happening, but I can't figure out how to get what I want in an ifelse context, so that it returns the corresponding TWAS.Z where possible, or fills with NA.

Thanks in advance.

StupidWolf · Accepted Answer

You can use,

match(df1$Gene,df2$Gene)
[1]  1 NA  2  3 NA

This vectors tells you for every df1$Gene, the corresponding position in df2$Gene. If it is missing, returns NA

New dataframe will be

data.frame(Gene=df1$Gene,
TWAS.testable=df2$TWAS.Z[match(df1$Gene,df2$Gene)])
   Gene TWAS.testable
1 gene1          0.43
2 gene2            NA
3 gene3          3.63
4 gene4          0.11
5 gene5            NA

Matching values between dataframes using ifelse

Answers (2)

Related Questions