How to count the number of occurances of a character within a data set in R

Question

I am currently playing around with a mock data set in which I record the number of times that a given species of flower is visited by a type of bee. Part of my data set might look something like this:

Plant   Visitor.1   Visitor.2   Visitor.3
  1     Bombus      Bombus      NA
  2     Apis        Bombus      Apis
  3     NA          NA          NA
  4     Apis        NA          NA
  5     NA          NA          NA
  6     Apis        NA          NA
  7     Apis        Apis        Halictid
  8     Apis        Apis        NA
  9     Bombus      Halictid    Halictid
 10     NA          NA          NA
 11     NA          NA          NA
 12     NA          NA          NA
 13     Halictid    NA          NA

Is there a way for me to count how many times that "Bombus", "Apis", "Halictid", etc. occurs in each column, or even across all three columns at once? I have read a lot on how to do this with strings of data, but not with a data set such as this. I'm not too sure where to start, to be honest.

akrun · Accepted Answer

You may try

library(qdapTools)
addmargins(t(mtabulate(df1[-1])),2)
#          Visitor.1 Visitor.2 Visitor.3 Sum
#Apis             5         2         1   8
#Bombus           2         2         0   4
#Halictid         1         1         2   4

If you need the count only for the whole dataset

table(unlist(df1[-1]))

#   Apis   Bombus Halictid 
#    8        4        4

I am not sure if you need this for each row i.e for each 'Plant'. In that case,

Un1 <- unique(na.omit(unlist(df1[-1])))
res <- t(apply(df1[-1], 1, FUN=function(x) table(factor(x, levels=Un1))))
cbind(res, Sum=rowSums(res))
#     Bombus Apis Halictid Sum
#[1,]      2    0        0   2
#[2,]      1    2        0   3
#[3,]      0    0        0   0
#[4,]      0    1        0   1
#[5,]      0    0        0   0
#[6,]      0    1        0   1
#[7,]      0    2        1   3
#[8,]      0    2        0   2
#[9,]      1    0        2   3
#[10,]     0    0        0   0
#[11,]     0    0        0   0
#[12,]     0    0        0   0
#[13,]     0    0        1   1

Or using mtabulate

 addmargins(as.matrix(mtabulate(as.data.frame(t(df1[-1])))),2)

Update

If you need for columns (using only base R),

addmargins(t(apply(df1[-1], 2, FUN=function(x) table(factor(x, levels=Un1)))))
#           Bombus Apis Halictid Sum
#Visitor.1      2    5        1   8
#Visitor.2      2    2        1   5
#Visitor.3      0    1        2   3
#Sum            4    8        4  16

Or a more compact version would be

 addmargins(table(stack(df1[-1])[2:1]))
 #           values
 #ind         Apis Bombus Halictid Sum
 # Visitor.1    5      2        1   8
 # Visitor.2    2      2        1   5
 # Visitor.3    1      0        2   3
 # Sum          8      4        4  16

data

df1 <- structure(list(Plant = 1:13, Visitor.1 = c("Bombus", "Apis", 
NA, "Apis", NA, "Apis", "Apis", "Apis", "Bombus", NA, NA, NA, 
"Halictid"), Visitor.2 = c("Bombus", "Bombus", NA, NA, NA, NA, 
"Apis", "Apis", "Halictid", NA, NA, NA, NA), Visitor.3 = c(NA, 
"Apis", NA, NA, NA, NA, "Halictid", NA, "Halictid", NA, NA, NA, 
NA)), .Names = c("Plant", "Visitor.1", "Visitor.2", "Visitor.3"
), class = "data.frame", row.names = c(NA, -13L))

How to count the number of occurances of a character within a data set in R

Answers (2)

Update

data

Related Questions