Reputation: 235
I have a data frame 200 columns containing 150 genes (rows) in each column.
I want to count the number occurrences for each gene in the whole data frame
mydat <-
V1 V2 V3 V4 V5 V6 V7 V8
1 TGFBR2 TGFBR2 TGFBR2 TGFBR2 TGFBR2 TGFBR2 TGFBR2 TGFBR2
2 MAML2 MAML2 MAML2 MAML2 MAML2 MAML2 MAML2 MAML2
3 BMPR2 EIF5A WRAP53 WRAP53 EIF5A EIF5A EIF5A EIF5A
4 EIF5A BMPR2 EIF5A EIF5A ADAMTSL3 BMPR2 WRAP53 BMPR2
5 EIF5AL1 WRAP53 ADAMTSL3 BMPR2 BMPR2 WRAP53 BMPR2 EIF5AL1
6 WRAP53 EIF5AL1 BMPR2 ADAMTSL3 WRAP53 EIF5AL1 EIF5AL1 WRAP53
7 TBC1D5 ADAMTSL3 EIF5AL1 EIF5AL1 EIF5AL1 ADAMTSL3 ADAMTSL3 C1QTNF7
8 ADAMTSL3 C1QTNF7 C1QTNF7 C1QTNF7 FHL1 YAP1 AURKB ADAMTSL3
9 C1QTNF7 FHL1 RGS7BP LIFR C1QTNF7 TMEM43 C1QTNF7 LIFR
10 AURKB RGS5 AURKB FAM198B AURKB C1QTNF7 PSMB6 PDGFD
So I want the output to be something like this:
occurences
TGFBR2: 8
MALM2 : 8
FHL1: 3
etc. But I want to count every gene in the data frame.
How do I do this?
Upvotes: 20
Views: 56272
Reputation: 41225
Another option using table
and stack
, which concatenates multiple vectors into a single vector along with a factor indicating where each observation originated so you can count the values
like this:
table(stack(df)$values)
#>
#> ADAMTSL3 AURKB BMPR2 C1QTNF7 EIF5A EIF5AL1 FAM198B FHL1
#> 8 4 8 8 8 8 1 2
#> LIFR MAML2 PDGFD PSMB6 RGS5 RGS7BP TBC1D5 TGFBR2
#> 2 8 1 1 1 1 1 8
#> TMEM43 WRAP53 YAP1
#> 1 8 1
Created on 2022-10-21 with reprex v2.0.2
Upvotes: 1
Reputation: 24074
try
occurences<-table(unlist(mydat))
(I assigned the result so you don't get a full screen of gene names and so each gene's occurence can be accessed by occurences["genename"]
)
Upvotes: 29
Reputation: 81683
table(unlist(mydat))
will do the trick.
ADAMTSL3 AURKB BMPR2 C1QTNF7 EIF5A EIF5AL1 MAML2 TBC1D5
8 4 8 8 8 8 8 1
TGFBR2 WRAP53 FHL1 RGS5 RGS7BP FAM198B LIFR TMEM43
8 8 2 1 1 1 2 1
YAP1 PSMB6 PDGFD
1 1 1
Upvotes: 11