Reputation: 373
I have a dataframe MutationsNumberTable
like this
ACC BLCA BRCA CESC HGNC
1: 1 2 6 0 OPN4
2: 2 3 1 1 KLRB1
3: 2 23 4 5 SALL2
4: 1 8 5 7 PLCB2
The goal is to create a matrix where I have unique pairs "gene-cancer type" where a number in a table is greater than a threshold (let's say, 5):
Desired output:
HGNC Cancer
1: OPN4 CESC
2: SALL2 BRCA
3: SALL2 CESC
4: PLCB2 BLCA
5: PLCB2 BRCA
6: PLCB2 CESC
So far, I could come up with this:
n = ncol(MutationsNumberTable)
whereTrue = MutationsNumberTable[,1:(n-1)] >=threshold
but I have difficulties after to use these logical values to make a matrix I need. I tried
colnames(whereTrue)[whereTrue]
but it is not exactly what I need.
Upvotes: 3
Views: 72
Reputation: 887008
We can do a gather
to 'long' format and then filter
library(dplyr)
library(tidyr)
gather(df1, Cancer, val, -HGNC) %>%
filter(val >= 5) %>%
select(-val)
Or using data.table
library(data.table)
setDT(df1)[, melt(.SD, id.var = 'HGNC')[value >= 5, .(HGNC, Cancer = variable)]]
Upvotes: 5