NathanaelH
NathanaelH

Reputation: 15

frequency count across multiple columns using r

I have a data frame in the form of:

x <-
Chrom    sample1    sample2    sample3  ...
Contig12    0/0     0/0     0/1
Contig12    ./.     ./.     0/0
Contig28    0/0     0/0     0/0
Contig28    1/1     1/1     1/1
Contig55    0/0     0/0     0/1
Contig55    0/1     0/1     0/1
Contig61    ./.     0/1     1/1
.
.
.

There are ~20000 rows and ~100 unique columns, I am trying to count the number of times each unique state occurs across each column (sample) so that I get:

         sample1    sample2     sample3     ...
./.      2          1           0
0/0      3          3           2
0/1      1          2           3
1/1      1          1           2

Any suggestions on how I can do this? I have tried to use the count() from the plyr package but I cannot figure out how to do it across every column.

Any help is greatly appreciated!

Upvotes: 1

Views: 300

Answers (1)

A. Suliman
A. Suliman

Reputation: 13125

library(dplyr)
df %>% gather(key, value, -Chrom) %>% # gather turn dataset from wide to long format by collapse (collect) values in all columns 
                                      #except Chrom into two columns key and value. See ?gather for more info
       dplyr::select(-Chrom) %>%      #select all columns except Chrom i.e. key and value 
       table()                        # count the number of each unique pear

         value
 key       ./. 0/0 0/1 1/1
  sample1   2   3   1   1
  sample2   1   3   2   1
  sample3   0   2   3   2

Data

df <- read.table(text="
      Chrom    sample1    sample2    sample3
             Contig12    0/0     0/0     0/1
             Contig12    ./.     ./.     0/0
             Contig28    0/0     0/0     0/0
             Contig28    1/1     1/1     1/1
             Contig55    0/0     0/0     0/1
             Contig55    0/1     0/1     0/1
             Contig61    ./.     0/1     1/1
              ",header=T, stringsAsFactors = F)

Upvotes: 2

Related Questions