Debby
Debby

Reputation: 29

Identify identical columns in two data frames and extract them in r

I have two data frames; mRNA (here) and RPPA(here). The mRNA data frame has 1,212 columns, while the RPPA data frame has 937 columns. All columns names in the RPPA data frame appear also in the mRNA data frame (but not in the same order). Within the columns, the values are different between the two data frames.
I want to create a new mRNA data frame, which will contain the same columns as the RPPA data frame, and will not contain the columns that do not appear in the ("old") mRNA data frame.
An example:

mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))  
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))  

The expected result would be:

> new.mRNA
B     A     D
56    25    13
89    76    65
12    23    23
452   45    16

I've tried converting the RPPA column names into a vector, and than use it with the command mRNA[col.names.vector], as described here, but it doesn't work. It gives the error undefined columns selected.

Is there a quick way to do it (without functions, loops etc.)?

Upvotes: 1

Views: 2271

Answers (4)

simranpal kohli
simranpal kohli

Reputation: 21

You can find the dissimilar columns in two data frames as per the below code.

col_name=colnames(mRNA[which(!(colnames(mRNA) %in% colnames(RPPA)))])

new_mRNA=mRNA %>% select(-col_name)

Upvotes: 1

Debby
Debby

Reputation: 29

Both of the answers that were posted didn't work for my data. Thanks to both answers posted, and with a little more research, I figured out the answer: First, you need to generate a vector that will include ONLY the column names that appear in BOTH data frames. In order to do that I used the command intersect and Reduce:

target <- Reduce(intersect, list(colnames(raw.mRNA), colnames(RPPA)))

Now you can use the answer that was given:

new.mRNA <- mRNA[target]

and this will generate a new data frame with the right values.
Thank you @akrun and @Titolondon for your help

Upvotes: 1

cderv
cderv

Reputation: 6542

Subset of a data.frame with a vector should have work.

  1. Create a vector of the column name you want to keep
  2. Subset you data.frame using this vector


mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))  
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))  

mRNA
#>    A   B   C  D  E   F
#> 1 25  56  45 13 17 256
#> 2 76  89 456 65 18  34
#> 3 23  12 243 23 19   0
#> 4 45 452   5 16 20   5
RPPA
#>    B  A  D
#> 1 46 51 76
#> 2 47 87 34
#> 3 45 34 98
#> 4 49 87 23
mRNA[, names(RPPA)]
#>     B  A  D
#> 1  56 25 13
#> 2  89 76 65
#> 3  12 23 23
#> 4 452 45 16

Upvotes: 0

akrun
akrun

Reputation: 886938

We can subset the mRNA by the column names of 'RPPA' and assign it to 'RPPA'

RPPA[] <- mRNA[names(RPPA)]

Upvotes: 0

Related Questions