J.Mc
J.Mc

Reputation: 72

count the number of times a number (factor) occurs within each group, for each column in the dataframe

As the title suggests I'm trying to extend the question asked here:

count the number of times a number (factor) occurs within each group

But for all columns in a given dataframe. A reproducible example below:

dat <- data.frame(Bin = rep(1:4, each = 50), 
                  Number = sample(5, 200, replace = T, prob = c(1,1,1,2,3)),
                  Number2 = sample(5, 200, replace = T, prob = c(1,1,1,2,3)))


> head(dat)
  Bin Number Number2
1   1      4       2
2   1      5       5
3   1      4       4
4   1      4       1
5   1      5       5
6   1      5       3

I can do it by multiple dcasts.

dcast(dat, Bin ~ Number)
dcast(dat, Bin ~ Number2) 

However, my actual dataframe has many more columns. Any help would be appreciated!

Thanks.

Upvotes: 1

Views: 505

Answers (5)

akrun
akrun

Reputation: 886938

We can use recast from reshape2

library(reshape2)
recast(dat, id.var = 'Bin', Bin + variable ~ value, length)

Upvotes: 0

Patricio Moracho
Patricio Moracho

Reputation: 717

Conceptually it is the same solution by Ronak Shah but a little bit simpler.

library(tidyverse)

dat %>% 
  pivot_longer(-Bin) %>% 
  pivot_wider(names_from = value, values_fn = length, names_sort=TRUE)

# A tibble: 8 x 7
    Bin name      `1`   `2`   `3`   `4`   `5`
  <int> <chr>   <int> <int> <int> <int> <int>
1     1 Number     10     7     3    10    20
2     1 Number2    10     6     6     8    20
3     2 Number      2     7     6     8    27
4     2 Number2     2     5     8    13    22
5     3 Number      3     8    13    12    14
6     3 Number2     9     5     6     7    23
7     4 Number      9     6     7     3    25
8     4 Number2     2     7     8    19    14

Upvotes: 1

monte
monte

Reputation: 1845

With base R, you can use apply family of functions and table:

as.data.frame(apply(dat[, 2:ncol(dat)], 2, table))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

Get the data in long format and use count :

library(dplyr)
library(tidyr)

dat %>%
  pivot_longer(cols = starts_with('Number')) %>%
  count(Bin, name, value) %>%
  pivot_wider(names_from = name, values_from = n)

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18425

One way would be to use the base-R function tabulate...

dat %>% 
  group_by(Bin) %>%                              #group by bin
  summarise(across(everything(), tabulate)) %>%  #for everything else generate a tabulation
  mutate(no = row_number())                      #add numbers being tabulated

# A tibble: 20 x 4
# Groups:   Bin [4]
     Bin Number Number2    no
   <int>  <int>   <int> <int>
 1     1      7       3     1
 2     1      9       6     2
 3     1      5       8     3
 4     1      9      16     4
 5     1     20      17     5
 6     2      4       5     1
 7     2      5       4     2
 8     2      8      10     3
 9     2     11      13     4
10     2     22      18     5
11     3      6       6     1
12     3      6       9     2
13     3      7       5     3
14     3     11      13     4
15     3     20      17     5
16     4      3       7     1
17     4      3       6     2
18     4      7       4     3
19     4     19      12     4
20     4     18      21     5

So, for example, bin 1, number 4 (the final column) has 9 in Number and 16 in Number2

Upvotes: 0

Related Questions