Rayan Sp
Rayan Sp

Reputation: 1020

Count the occurrences of factors

I have a df like this consisted of 1 column on colname Var1

df <- read.table(text = "Var1
|12|24|22|1|4
|12|23|22|1|445
|12|22|22|1|4
|101|116
|101|116|116|174
|101|116|125|174
|101|116|150|174
|101|116|156
|101|116|156|174
|101|116|162", header = TRUE, stringsAsFactors = FALSE)

Questions:

  1. How can I count the occurrence of this |22| ?
  2. How to count the number occurrences of |22| as 1 occurrence only if its repeated more than once in a specific row. example in the 3rd row |22| is repeated twice, I want R to count it as 1 Only.

Upvotes: 1

Views: 407

Answers (4)

Thierry
Thierry

Reputation: 18487

That's easy with a regexp

sum(grepl("\\|22(\\||$)", df$Var1))

Upvotes: 2

David Arenburg
David Arenburg

Reputation: 92292

You could also probably just read you data set using the | as a column separator, and then all the operations will be pretty straight forward

df <- as.matrix(read.table(text = "|12|24|22|1|4
|12|23|22|1|445
|12|22|22|1|4
|101|116
|101|116|116|174
|101|116|125|174
|101|116|150|174
|101|116|156
|101|116|156|174
|101|116|162", fill = TRUE, sep = "|"))    

sum(df == 22, na.rm = TRUE)
# [1] 4
(rowSums(df == 22, na.rm = TRUE) > 0) + 0
# [1] 1 1 1 0 0 0 0 0 0 0
sum(rowSums(df == 22, na.rm = TRUE) > 0)
# [1] 3

Alternatively, you could also convert you original df to a data.table and use the tstrsplit function

df <- read.table(text = "Var1
                 |12|24|22|1|4
                 |12|23|22|1|445
                 |12|22|22|1|4
                 |101|116
                 |101|116|116|174
                 |101|116|125|174
                 |101|116|150|174
                 |101|116|156
                 |101|116|156|174
                 |101|116|162", header = TRUE)

library(data.table)
DT <- setDT(df)[, tstrsplit(Var1, "|", fixed = TRUE)]
DT[, sum(.SD == 22, na.rm = TRUE)]
# [1] 4
DT[, sum(rowSums(.SD == 22, na.rm = TRUE) > 0)]
# [1] 3

Upvotes: 5

JereB
JereB

Reputation: 137

Please post a repoducible example next time.

You can do this using regular expression with grepl. With df as your data.frame

length(df[grepl('|22|',df$Var, fixed=T),])

This will answer your second question and can easily be adapted for Q 1.

Upvotes: 1

Roland
Roland

Reputation: 132706

DF <- read.table(text = "Var1
                 |12|24|22|1|4
                 |12|23|22|1|445
                 |12|22|22|1|4
                 |101|116
                 |101|116|116|174
                 |101|116|125|174
                 |101|116|150|174
                 |101|116|156
                 |101|116|156|174
                 |101|116|162", header = TRUE, stringsAsFactors = FALSE)
x <- strsplit(DF$Var1, "|", fixed = TRUE)
sum(unlist(x) == "22")
#[1] 4
sum(sapply(x, function(s) "22" %in% s))
#[1] 3

Upvotes: 5

Related Questions