Lilith-Elina
Lilith-Elina

Reputation: 1673

Combine data.frames in R using only common row.names

I have five data.frames with gene expression data for different sets of samples. I have a different number of rows in each data.set and therefore only partly overlapping row.names (genes).

Now I want a) to filter the five data.frames to contain only genes that are present in all data.frames and b) to combine the gene expression data for those genes to one data.frame.

All I could find so far was merge, but that can only merge two data.frames, so I'd have to use it multiple times. Is there an easier way?

Upvotes: 1

Views: 6222

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81743

Merging is not very efficient if you want to exclude row names which are not present in every data frame. Here's a different proposal.

First, three example data frames:

df1 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[1:5]) # letters a to e
df2 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[3:7]) # letters c to g
df3 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e, and g
# row names being present in all data frames: c and e

Put the data frames into a list:

dfList <- list(df1, df2, df3)

Find common row names:

idx <- Reduce(intersect, lapply(dfList, rownames))

Extract data:

df1[idx, ]

  a b
c 3 3
e 5 5

PS. If you want to keep the corresponding rows from all data frames, you could replace the last step, df1[idx, ], with the following command:

do.call(rbind, lapply(dfList, "[", idx, ))

Upvotes: 5

fdetsch
fdetsch

Reputation: 5308

Check out the uppermost answer in this SO post. Just list your data frames and apply the following line of code:

Reduce(function(...) merge(..., by = "x"), list.of.dataframes)

You just have to adjust the by argument to specify by which common column the data frames should be merged.

Upvotes: 0

Related Questions