Reputation: 29
I have two data frames; mRNA
(here) and RPPA
(here). The mRNA
data frame has 1,212 columns, while the RPPA
data frame has 937 columns. All columns names in the RPPA
data frame appear also in the mRNA
data frame (but not in the same order). Within the columns, the values are different between the two data frames.
I want to create a new mRNA
data frame, which will contain the same columns as the RPPA
data frame, and will not contain the columns that do not appear in the ("old") mRNA
data frame.
An example:
mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))
The expected result would be:
> new.mRNA
B A D
56 25 13
89 76 65
12 23 23
452 45 16
I've tried converting the RPPA column names into a vector, and than use it with the command mRNA[col.names.vector]
, as described here, but it doesn't work. It gives the error undefined columns selected
.
Is there a quick way to do it (without functions, loops etc.)?
Upvotes: 1
Views: 2271
Reputation: 21
You can find the dissimilar columns in two data frames as per the below code.
col_name=colnames(mRNA[which(!(colnames(mRNA) %in% colnames(RPPA)))])
new_mRNA=mRNA %>% select(-col_name)
Upvotes: 1
Reputation: 29
Both of the answers that were posted didn't work for my data. Thanks to both answers posted, and with a little more research, I figured out the answer:
First, you need to generate a vector that will include ONLY the column names that appear in BOTH data frames. In order to do that I used the command intersect
and Reduce
:
target <- Reduce(intersect, list(colnames(raw.mRNA), colnames(RPPA)))
Now you can use the answer that was given:
new.mRNA <- mRNA[target]
and this will generate a new data frame with the right values.
Thank you @akrun and @Titolondon for your help
Upvotes: 1
Reputation: 6542
Subset of a data.frame with a vector should have work.
mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))
mRNA
#> A B C D E F
#> 1 25 56 45 13 17 256
#> 2 76 89 456 65 18 34
#> 3 23 12 243 23 19 0
#> 4 45 452 5 16 20 5
RPPA
#> B A D
#> 1 46 51 76
#> 2 47 87 34
#> 3 45 34 98
#> 4 49 87 23
mRNA[, names(RPPA)]
#> B A D
#> 1 56 25 13
#> 2 89 76 65
#> 3 12 23 23
#> 4 452 45 16
Upvotes: 0
Reputation: 886938
We can subset the mRNA
by the column names of 'RPPA' and assign it to 'RPPA'
RPPA[] <- mRNA[names(RPPA)]
Upvotes: 0