Reputation: 357
I am trying to match data between two dataframes, but am getting the value for the position in the vector, rather than the corresponding value.
I have two data.frames:
df1=data.frame(Gene=c("gene1","gene2","gene3","gene4","gene5"),TWAS.testable=c(1,0,1,1,0),stringsAsFactors=FALSE)
> df1
Gene TWAS.testable
1 gene1 1
2 gene2 0
3 gene3 1
4 gene4 1
5 gene5 0
df2=data.frame(Gene=c("gene1","gene3","gene4","gene7","gene8"),TWAS.Z=c(0.43,3.63,0.11,-0.82,0.36),stringsAsFactors=FALSE)
> df2
Gene TWAS.Z
1 gene1 0.43
2 gene3 3.63
3 gene4 0.11
4 gene7 -0.82
5 gene8 0.36
I am trying to replace the values in TWAS.testable, with those in TWAS.Z which correspond to the matching Gene, otherwise fill with NA. So that what I get back is:
Gene TWAS.testable
1 gene1 0.43
2 gene2 NA
3 gene3 3.63
4 gene4 0.11
5 gene5 NA
So I tried:
df1$TWAS.testable=ifelse(df1$Gene %in% df2$Gene,df2$TWAS.Z,NA)
which returns
> df1
Gene TWAS.testable
1 gene1 0.43
2 gene2 NA
3 gene3 0.11
4 gene4 -0.82
5 gene5 NA
so it is returning the position in the vector, rather than matching TWAS.Z to its corresponding Gene.
i.e. gene3 is the third object in df1$Gene, so it is filling TWAS.testable with 0.11, the 3rd object from df2$TWAS.Z
. When really, I want the df2$TWAS.Z
where df1$Gene==df2$Gene
.
I can see why this is happening, but I can't figure out how to get what I want in an ifelse context, so that it returns the corresponding TWAS.Z where possible, or fills with NA.
Thanks in advance.
Upvotes: 1
Views: 72
Reputation: 101638
You can try
df1$TWAS.testable <- df2$TWAS.Z[match(df1$Gene,df2$Gene)]
Upvotes: 1
Reputation: 46908
You can use,
match(df1$Gene,df2$Gene)
[1] 1 NA 2 3 NA
This vectors tells you for every df1$Gene, the corresponding position in df2$Gene. If it is missing, returns NA
New dataframe will be
data.frame(Gene=df1$Gene,
TWAS.testable=df2$TWAS.Z[match(df1$Gene,df2$Gene)])
Gene TWAS.testable
1 gene1 0.43
2 gene2 NA
3 gene3 3.63
4 gene4 0.11
5 gene5 NA
Upvotes: 1