Escher
Escher

Reputation: 5776

How to count how many values per level in a given factor?

I have a data.frame mydf with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1, and I want to count how many rows per object class I have. I can get a factor of these classes with:

objectclasses = unique(factor(mydf$V1, exclude="1"));

What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.

Upvotes: 53

Views: 226614

Answers (9)

Victor
Victor

Reputation: 9

This is an old post, but you can do this with base R and no data frames/data tables:

sapply(levels(yTrain), function(sLevel) sum(yTrain == sLevel))

Upvotes: 0

Peter
Peter

Reputation: 2371

In case I just want to know how many unique factor levels exist in the data, I use:

length(unique(df$factorcolumn))

Upvotes: 2

iamigham
iamigham

Reputation: 117

One more approach would be to apply n() function which is counting the number of observations

library(dplyr)
library(magrittr)
data %>% 
  group_by(columnName) %>%
  summarise(Count = n())

Upvotes: 6

Christian Savemark
Christian Savemark

Reputation: 21

Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.

library(plyr)
lapply(df, count)

Upvotes: 1

Spariant
Spariant

Reputation: 171

We can use summary on factor column:

summary(myDF$factorColumn)

Upvotes: 17

Andriy T.
Andriy T.

Reputation: 2030

Using plyr package:

library(plyr)

count(mydf$V1)

It will return you a frequency of each value.

Upvotes: 34

akrun
akrun

Reputation: 887291

Using data.table

 library(data.table)
 setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)

Or using dplyr 0.3

 res <- count(dat, ID)
 head(res)
 #Source: local data frame [6 x 2]

 #  ID n
 #1  a 2
 #2  b 3
 #3  c 3
 #4  d 3
 #5  e 2
 #6  f 4

Or

  dat %>% 
      group_by(ID) %>% 
      tally()

Or

  dat %>% 
      group_by(ID) %>%
      summarise(n=n())

Upvotes: 26

Paul Hiemstra
Paul Hiemstra

Reputation: 60944

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

Upvotes: 65

agstudy
agstudy

Reputation: 121578

Here 2 ways to do it:

set.seed(1)
tt <- sample(letters,100,rep=TRUE)

## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 

Upvotes: 41

Related Questions