Reputation: 958
Imagine I have this 4 data frames:
abc_df
abc_ID . abc_classification
a . neutral
b . deletereous
c . benign
def_df
def_ID . def_classification
f . neutral
a . neutral
c . benign
ghi_df
ghi_ID . ghi_classification
f . deletereous
c . benign
k . neutral
vmk_df
vmk_ID . vmk_classification
c . benign
k . deletereous
a . neutral
As you can see, the columns "dfname_ID" and "dfname_classification" are not contiguous (the dots represent another columns in the data frame) and have not the same colnames. So, I would like to extract the common rows between all data frames for these 2 columns, using the index of the columns, and not their names.
The output should be this:
ID . classification
c . benign
I am trying to use intersect, lapply(mget(c('abc_df', 'def_df', 'ghi_df', 'vmk_df')))
, but I don't know how to specify the correct command. Do you know how can I solve this?
Upvotes: 1
Views: 264
Reputation: 46898
Might need to use purrr, so the conversion to character might not be necessary since intersect forces it to change:
library(purrr)
library(magrittr)
COLUMNS = c(1,2,3)
list(abc_df,def_df,ghi_df,vmk_df) %>%
map(~mutate_if(.x[,COLUMNS],is.factor, as.character)) %>%
map(~set_colnames(.x,c("id",".","classification"))) %>%
reduce(intersect)
id . classification
1 c . benign
Your data:
abc_df = structure(list(abc_ID = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"),
abc_classification = structure(3:1, .Label = c("benign",
"deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
def_df = structure(list(def_ID = structure(c(3L, 1L, 2L), .Label = c("a",
"c", "f"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"),
def_classification = structure(c(2L, 2L, 1L), .Label = c("benign",
"neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
ghi_df = structure(list(ghi_ID = structure(c(2L, 1L, 3L), .Label = c("c",
"f", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"),
ghi_classification = structure(c(2L, 1L, 3L), .Label = c("benign",
"deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
vmk_df = structure(list(vmk_ID = structure(c(2L, 3L, 1L), .Label = c("a",
"c", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"),
vmk_classification = structure(1:3, .Label = c("benign",
"deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Reputation: 1025
For the data you provided you could use:
library(dplyr)
abc_df %>%
rename(ID = abc_ID, classification = abc_classification) %>%
inner_join(def_df, by = c("ID" = "def_ID",
"classification" = "def_classification")) %>%
inner_join(ghi_df, by = c("ID" = "ghi_ID",
"classification" = "ghi_classification")) %>%
inner_join(jkl_df, by = c("ID" = "jkl_ID",
"classification" = "jkl_classification"))
Upvotes: 0