lizaveta
lizaveta

Reputation: 373

Extract column name and match with another column

I have a dataframe MutationsNumberTable like this

    ACC BLCA BRCA CESC   HGNC
1:   1    2    6    0   OPN4
2:   2    3    1    1  KLRB1
3:   2   23    4    5  SALL2
4:   1    8    5    7  PLCB2

The goal is to create a matrix where I have unique pairs "gene-cancer type" where a number in a table is greater than a threshold (let's say, 5):

Desired output:

     HGNC Cancer
1:   OPN4 CESC
2:   SALL2 BRCA
3:   SALL2 CESC
4:   PLCB2 BLCA 
5:   PLCB2 BRCA 
6:   PLCB2 CESC

So far, I could come up with this:

n = ncol(MutationsNumberTable)
whereTrue = MutationsNumberTable[,1:(n-1)] >=threshold

but I have difficulties after to use these logical values to make a matrix I need. I tried

colnames(whereTrue)[whereTrue]

but it is not exactly what I need.

Upvotes: 3

Views: 72

Answers (1)

akrun
akrun

Reputation: 887008

We can do a gather to 'long' format and then filter

library(dplyr)
library(tidyr)
gather(df1, Cancer, val, -HGNC) %>%
     filter(val >= 5) %>%
     select(-val)

Or using data.table

library(data.table)
setDT(df1)[, melt(.SD, id.var = 'HGNC')[value >= 5, .(HGNC, Cancer = variable)]]

Upvotes: 5

Related Questions