Reputation: 181
I tried using the merge function here, but I am stumped. I apologize, because this seems basic, but the by.x and by.y functions are quite confusing to me. I would like to extract the shared columns between dataframe A and dataframe B, and then merge the two dataframes together. The dataframes do not share any Taxa (the first column) but they will share a portion of columns X1 - X10000, etc. Each of these dataframes has ~8,000 columns and a few hundred rows. In this example, variables X2 and X5 are shared, but the other variables X1 and X3 are not shared. Based on intersecting column name vectors, I know that the dataframes share ~3000 columns.
Dataframe A:
Taxa X1 X2 X5
118 T N A
113 N N A
60 C Y G
121 N N N
Dataframe B:
Taxa X2 X3 X5
200 C G N
119 T N G
30 C G G
21 C N N
Desired merged dataframe:
Taxa X2 X5
118 N A
113 N A
60 Y G
121 N N
200 C N
119 T G
30 C G
21 C N
When I try using the merge function, in a variety of ways, I get this (with my actual column numbers here):
Taxa X408050 X995019
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
NA.3 <NA> <NA> <NA>
NA.4 <NA> <NA> <NA>
NA.5 <NA> <NA> <NA>
NA.6 <NA> <NA> <NA>
Upvotes: 2
Views: 1160
Reputation: 23574
Taking PierreLafortune's advice, I will leave my suggestion as an answer.Since you said you have 8000 columns in both data frames, you want to find which column names are common between the two. In order to find common columns, you can use intersect()
. Once you have the necessary column names, you subset your data frames. Then, you can combine the two data frames.
ind <- intersect(names(mydf), names(mydf2))
rbind(mydf[, ind], mydf2[, ind])
# Taxa X2 X5
#1 118 N A
#2 113 N A
#3 60 Y G
#4 121 N N
#5 200 C N
#6 119 T G
#7 30 C G
#8 21 C N
DATA
mydf <- structure(list(Taxa = c(118L, 113L, 60L, 121L), X1 = c("T", "N",
"C", "N"), X2 = c("N", "N", "Y", "N"), X5 = c("A", "A", "G",
"N")), .Names = c("Taxa", "X1", "X2", "X5"), class = "data.frame", row.names = c(NA,
-4L))
mydf2 <- structure(list(Taxa = c(200L, 119L, 30L, 21L), X2 = c("C", "T",
"C", "C"), X3 = c("G", "N", "G", "N"), X5 = c("N", "G", "G",
"N")), .Names = c("Taxa", "X2", "X3", "X5"), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 6