Reputation: 143
I have a problem with my DFs in R. I have a df1 with gene names.
V1 V2
GENE A GENE E
GENE B GENE D
GENE C GENE A
GENE D GENE B
GENE E GENE C
and another dataframe called df2 like this:
Name ID Symbol
GENE A 1254 AKT
GENE B 1879 POU5F1
GENE C 5689 EGR1
GENE D 2385 JUN
GENE E 5687 MYC
The output I would like to have is the following:
NameSource SourceID NameTarget TargetID
AKT 1254 MYC 5687
POU5F1 1879 JUN 2385
EGR1 5689 AKT 1254
JUN 2385 POU5F1 1879
MYC 5687 EGR1 5689
I tried with the following syntax:
genes <- df1[which(df1$V1, df2$Symbol), ]
and with:
genes <- df1$V1 %in% df2$Symbol
But for some reason I cannot get the output I am expecting. Anyone?
Upvotes: 2
Views: 56
Reputation: 3098
with dplyr
and piping if you're in that kind of R:
df1 <- data.frame(V1=c("GENE A","GENE B","GENE C","GENE D", "GENE E"), V2=c("GENE E","GENE D","GENE A","GENE B", "GENE C"))
df2 <- data.frame(Name= c("GENE A","GENE B","GENE C","GENE D", "GENE E"),
ID= c(1254,1879,5689,2385,5687),
Symbol= c("AKT", "POU5F1", "EGR1", "JUN", "MYC")
)
library(dplyr)
df1 %>% right_join(df2, by=c("V1"="Name")) %>%
right_join(df2, by=c("V2"="Name")) %>%
arrange(V1) %>%
select(ID.x, Symbol.x, ID.y, Symbol.y) %>%
setNames(c("SourceID", "NameSource", "TargetID", "NameTarget"))
# SourceID NameSource TargetID NameTarget
#1 1254 AKT 5687 MYC
#2 1879 POU5F1 2385 JUN
#3 5689 EGR1 1254 AKT
#4 2385 JUN 1879 POU5F1
#5 5687 MYC 5689 EGR1
Upvotes: 3
Reputation: 887158
We can use match
res <- cbind(df2[-1], df2[match(df1$V2, df2$Name),-1])[c(2,1,4 ,3)]
colnames(res) <- c("NameSource", "SourceID", "NameTarget", "TargetID")
row.names(res) <- NULL
res
# NameSource SourceID NameTarget TargetID
#1 AKT 1254 MYC 5687
#2 POU5F1 1879 JUN 2385
#3 EGR1 5689 AKT 1254
#4 JUN 2385 POU5F1 1879
#5 MYC 5687 EGR1 5689
Upvotes: 4