KT_1
KT_1

Reputation: 8494

Count number of occurrences in R

For a sample dataframe:

df <- structure(list(area = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"), 
                      count = c(1L, 1L, 1L, 3L, 4L, 2L, 2L, 4L, 2L, 5L, 6L)), 
                 .Names = c("area", "count"), class = c("tbl_df", "tbl", "data.frame"), 
                 row.names = c(NA, -11L), spec = structure(list(cols = structure(list(area = structure(list(), 
                 class = c("collector_character", "collector")), count = structure(list(), class = c("collector_integer",
                 "collector"))), .Names = c("area", "count")), default = structure(list(), class = c("collector_guess", 
                "collector"))), .Names = c("cols", "default"), class = "col_spec"))

... which lists the number of occurrences of something per area, I wish to produce a another summary table showing how many areas have one occurrence, two occurrences, three occurrences etc. For example, there are three areas with 'One occurrence per area", three areas with 'Two occurrences per area", one area with 'Three occurrence per area" etc.

What is the best package/code to produce my desired result? I have tried with aggregate and plyr, but so far have had no success.

Upvotes: 1

Views: 2134

Answers (3)

Onyambu
Onyambu

Reputation: 79348

You can use base R functions: using @Jimbou solution

table(df$count)
1 2 3 4 5 6 
3 3 1 2 1 1 

Upvotes: 2

David
David

Reputation: 311

This is quite intuitive using the wonderful dplyr library.

First, we group the data by the unique values of count, then we count the number of occurrences per group using n().

library(dplyr)
df %>%
    group_by(count) %>%
    summarise(number = n())

# A tibble: 6 x 2
  count number
  <int>  <int>
1     1      3
2     2      3
3     3      1
4     4      2
5     5      1
6     6      1

Upvotes: 1

Felipe Alvarenga
Felipe Alvarenga

Reputation: 2652

I like the data.table syntax

library(data.table)
setDT(df) # transform data.frame into data.table format

# .N calculates the number of observations, by instance of the count variable
df[, .(n_areas = .N), by = count]

   count n_areas
1:     1       3
2:     3       1
3:     4       2
4:     2       3
5:     5       1
6:     6       1

See this question for comparison between the two big packages that are most used for this kind of operation: dplyr and data.table data.table vs dplyr: can one do something well the other can't or does poorly?

Upvotes: 2

Related Questions