Reputation: 35
I have 2 data frames:
df1 (all genes and their expression values -- each column name is a gene)
df2 (list of genes to analyse -- each gene is a column name, without any extra data)
And basically I want to merge them by the column names, obtaining a third data frame that is df1 but with only the genes present on both data frames (common column names).
I don't know if I explained well but let me know if I can provide more info.
Example of data frames:
df1 <- data.frame(matrix(ncol = 4, nrow = 0))
x1 <- c("name", "school", "job", "gender")
colnames(df1) <- x1
df2 <- data.frame(matrix(ncol = 3, nrow = 0))
x2 <- c("name", "age", "gender")
colnames(df2) <- x2
Basically here what I would want is df1 but reduced to columns present on both df1 and df2, and that would be "name" and "gender". But in my work, I have many genes so I cannot do it gene by gene.
Thank you!
Upvotes: 1
Views: 1524
Reputation: 886938
We can use intersect
on the column names of both 'df1' and 'df2' to select the columns of 'df1'
df1new <- df1[intersect(names(df1), names(df2))]
Or with dplyr
library(dplyr)
df1new <- df1 %>%
select(intersect(names(.), names(df2))
Upvotes: 1