Subset columns of one data frame according to another data frame's rows

Question

I would like to subset some of its columns according to another data frame's rows. So the two data frames are as shown below:

df1 <- structure(list(ID = structure(c(3L, 1L, 2L, 5L, 4L), .Label = c("cg08", "cg09", "cg29", "cg36", "cg65"), class = "factor"), chr = c(16L, 3L, 3L, 1L, 8L), gene = c(534L, 376L, 171L, 911L, 422L), GS12 = c(0.15, 0.87, 0.6, 0.1, 0.72), GS32 = c(0.44, 0.93, 0.92, 0.07, 0.91),     GS56 = c(0.46, 0.92, 0.62, 0.06, 0.87), GS87 = c(0.79, 0.93,     0.86, 0.08, 0.88)), .Names = c("ID", "chr", "gene", "GS12", "GS32", "GS56", "GS87"), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
df2 <- structure(list(samples = structure(c(1L, 2L, 4L, 3L, 6L, 5L), .Label = c("GS32", "GS33", "GS55", "GS56", "GS68", "GS87"), class = "factor"), ID2 = structure(c(1L, 6L, 3L, 4L, 5L, 2L), .Label = c("GM1", "GM10", "GM17", "GM18", "GM19", "GM7"), class = "factor")), .Names = c("samples", "ID2" ), class = "data.frame", row.names = c(NA, -6L))

Data:

df1:
            ID        chr   gene    GS12      GS32       GS56      GS87
        1 cg29        16    534     0.15      0.44       0.46      0.79  
        2 cg08         3    376     0.87      0.93       0.92      0.93 
        3 cg09         3    171     0.60      0.92       0.62      0.86 
        4 cg65         1    911     0.10      0.07       0.06      0.08
        5 cg36         8    422     0.72      0.91       0.87      0.88
df2:

samples ID2     
GS32    GM1         
GS33    GM7         
GS56    GM17        
GS55    GM18        
GS87    GM19        
GS68    GM10

I would like to subset all columns from df1 (while keeping all the rows in the final output) that are common in ID column of df2, in a nutshell, I would like to subset columns of one data frame according to the rows of another data frame, is there any function that does this?

Bas · Accepted Answer

The input data:

df1 <- structure(list(ID = structure(c(3L, 1L, 2L, 5L, 4L), .Label = c("cg08", "cg09", "cg29", "cg36", "cg65"), class = "factor"), chr = c(16L, 3L, 3L, 1L, 8L), gene = c(534L, 376L, 171L, 911L, 422L), GS12 = c(0.15, 0.87, 0.6, 0.1, 0.72), GS32 = c(0.44, 0.93, 0.92, 0.07, 0.91),     GS56 = c(0.46, 0.92, 0.62, 0.06, 0.87), GS87 = c(0.79, 0.93,     0.86, 0.08, 0.88)), .Names = c("ID", "chr", "gene", "GS12", "GS32", "GS56", "GS87"), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
df2 <- structure(list(samples = structure(c(1L, 2L, 4L, 3L, 6L, 5L), .Label = c("GS32", "GS33", "GS55", "GS56", "GS68", "GS87"), class = "factor"), ID2 = structure(c(1L, 6L, 3L, 4L, 5L, 2L), .Label = c("GM1", "GM10", "GM17", "GM18", "GM19", "GM7"), class = "factor")), .Names = c("samples", "ID2" ), class = "data.frame", row.names = c(NA, -6L))

I believe what you are asking for is the following:

df1[colnames(df1) %in% df2$samples]
#  GS32 GS56 GS87
#1 0.44 0.46 0.79
#2 0.93 0.92 0.93
#3 0.92 0.62 0.86
#4 0.07 0.06 0.08
#5 0.91 0.87 0.88

You are checking which column names from df1 occur in the samples of df2.
However I assume you also need the ID, chromosome and gene in your output data frame, this can be done with the following:

df1[c(1:3, colnames(df1) %in% df2$samples)]
#    ID chr gene ID.1 ID.2 ID.3
#1 cg29  16  534 cg29 cg29 cg29
#2 cg08   3  376 cg08 cg08 cg08
#3 cg09   3  171 cg09 cg09 cg09
#4 cg65   1  911 cg65 cg65 cg65
#5 cg36   8  422 cg36 cg36 cg36

If you want to force the column order to be in the same order as before, use match instead of %in%. match requires at least two variables, firstone being the target vector, secondone being the data frame/vector which it needs to be sorted to.

df1[,c(1:3,na.omit(match(df2$samples, colnames(df1))))]
#    ID chr gene GS32 GS56 GS87
#1 cg29  16  534 0.44 0.46 0.79
#2 cg08   3  376 0.93 0.92 0.93
#3 cg09   3  171 0.92 0.62 0.86
#4 cg65   1  911 0.07 0.06 0.08
#5 cg36   8  422 0.91 0.87 0.88

Subset columns of one data frame according to another data frame's rows

Answers (1)

Related Questions

Subset columns of one data frame according to another data frame&#39;s rows

Answers (1)

Related Questions

Subset columns of one data frame according to another data frame's rows