kimmie
kimmie

Reputation: 235

counting occurrences in data.frame in r

I have a data frame 200 columns containing 150 genes (rows) in each column.

I want to count the number occurrences for each gene in the whole data frame

mydat <-

    V1       V2       V3        V4       V5       V6        V7       V8  
1   TGFBR2   TGFBR2   TGFBR2    TGFBR2   TGFBR2   TGFBR2    TGFBR2   TGFBR2
2   MAML2    MAML2    MAML2     MAML2    MAML2    MAML2     MAML2    MAML2
3   BMPR2    EIF5A    WRAP53    WRAP53   EIF5A    EIF5A     EIF5A    EIF5A
4   EIF5A    BMPR2    EIF5A     EIF5A    ADAMTSL3 BMPR2     WRAP53   BMPR2
5   EIF5AL1  WRAP53   ADAMTSL3  BMPR2    BMPR2    WRAP53    BMPR2    EIF5AL1
6   WRAP53   EIF5AL1  BMPR2     ADAMTSL3 WRAP53   EIF5AL1   EIF5AL1  WRAP53
7   TBC1D5   ADAMTSL3 EIF5AL1   EIF5AL1  EIF5AL1  ADAMTSL3  ADAMTSL3 C1QTNF7
8   ADAMTSL3 C1QTNF7  C1QTNF7   C1QTNF7  FHL1     YAP1      AURKB    ADAMTSL3
9   C1QTNF7  FHL1     RGS7BP    LIFR     C1QTNF7  TMEM43    C1QTNF7  LIFR
10  AURKB    RGS5     AURKB     FAM198B  AURKB    C1QTNF7   PSMB6    PDGFD

So I want the output to be something like this:

occurences
TGFBR2: 8
MALM2 : 8
FHL1:   3

etc. But I want to count every gene in the data frame.

How do I do this?

Upvotes: 20

Views: 56272

Answers (3)

Quinten
Quinten

Reputation: 41225

Another option using table and stack, which concatenates multiple vectors into a single vector along with a factor indicating where each observation originated so you can count the values like this:

table(stack(df)$values)
#> 
#> ADAMTSL3    AURKB    BMPR2  C1QTNF7    EIF5A  EIF5AL1  FAM198B     FHL1 
#>        8        4        8        8        8        8        1        2 
#>     LIFR    MAML2    PDGFD    PSMB6     RGS5   RGS7BP   TBC1D5   TGFBR2 
#>        2        8        1        1        1        1        1        8 
#>   TMEM43   WRAP53     YAP1 
#>        1        8        1

Created on 2022-10-21 with reprex v2.0.2

Upvotes: 1

Cath
Cath

Reputation: 24074

try

occurences<-table(unlist(mydat))

(I assigned the result so you don't get a full screen of gene names and so each gene's occurence can be accessed by occurences["genename"])

Upvotes: 29

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

table(unlist(mydat))

will do the trick.

ADAMTSL3    AURKB    BMPR2  C1QTNF7    EIF5A  EIF5AL1    MAML2   TBC1D5 
       8        4        8        8        8        8        8        1 
  TGFBR2   WRAP53     FHL1     RGS5   RGS7BP  FAM198B     LIFR   TMEM43 
       8        8        2        1        1        1        2        1 
    YAP1    PSMB6    PDGFD 
       1        1        1 

Upvotes: 11

Related Questions