Jeni
Jeni

Reputation: 958

Extract common values for more than one column between several R dataframes

Imagine I have this 4 data frames:

abc_df

abc_ID .  abc_classification
a      .     neutral
b      .     deletereous
c      .     benign

def_df

def_ID .  def_classification
f      .     neutral
a      .     neutral
c      .     benign

ghi_df

ghi_ID  .   ghi_classification
f       .     deletereous
c       .     benign
k       .     neutral

vmk_df

vmk_ID  .  vmk_classification
c       .     benign
k       .     deletereous
a       .     neutral

As you can see, the columns "dfname_ID" and "dfname_classification" are not contiguous (the dots represent another columns in the data frame) and have not the same colnames. So, I would like to extract the common rows between all data frames for these 2 columns, using the index of the columns, and not their names.

The output should be this:

ID  .   classification
c   .    benign

I am trying to use intersect, lapply(mget(c('abc_df', 'def_df', 'ghi_df', 'vmk_df'))), but I don't know how to specify the correct command. Do you know how can I solve this?

Upvotes: 1

Views: 264

Answers (2)

StupidWolf
StupidWolf

Reputation: 46898

Might need to use purrr, so the conversion to character might not be necessary since intersect forces it to change:

library(purrr)
library(magrittr)

COLUMNS = c(1,2,3)

list(abc_df,def_df,ghi_df,vmk_df) %>%
map(~mutate_if(.x[,COLUMNS],is.factor, as.character)) %>% 
map(~set_colnames(.x,c("id",".","classification"))) %>% 
reduce(intersect)

  id . classification
1  c .         benign

Your data:

abc_df = structure(list(abc_ID = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    abc_classification = structure(3:1, .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

def_df = structure(list(def_ID = structure(c(3L, 1L, 2L), .Label = c("a", 
"c", "f"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    def_classification = structure(c(2L, 2L, 1L), .Label = c("benign", 
    "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

ghi_df = structure(list(ghi_ID = structure(c(2L, 1L, 3L), .Label = c("c", 
"f", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    ghi_classification = structure(c(2L, 1L, 3L), .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

vmk_df = structure(list(vmk_ID = structure(c(2L, 3L, 1L), .Label = c("a", 
"c", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    vmk_classification = structure(1:3, .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Piotr K
Piotr K

Reputation: 1025

For the data you provided you could use:

library(dplyr)

abc_df %>%
  rename(ID = abc_ID, classification = abc_classification) %>% 
  inner_join(def_df, by = c("ID" = "def_ID",
                            "classification" = "def_classification")) %>%
  inner_join(ghi_df, by = c("ID" = "ghi_ID",
                            "classification" = "ghi_classification")) %>%
  inner_join(jkl_df, by = c("ID" = "jkl_ID",
                            "classification" = "jkl_classification"))

Upvotes: 0

Related Questions