Wilcar
Wilcar

Reputation: 2513

Count the number of rows matching a couple of pattern (AND operators) from a vector

I have dataset :

dataset <- c("male Neque porro quisquam est qui dolorem ipsum quia dolor sit amet female consectetur, adipisci velit young",
   "est qui dolorem tall dolorem ipsum  female Neque young",
   "male, female porro old")

dataset <- as.data.frame(dataset)

I have a keywords list :

 keywords <- c("male", "female", "young")

I can count the number of rows containing "words":

sapply(keywords, function(x) length(grep(x, dataset$dataset, ignore.case = TRUE)))

My result :

 male female  young 
   3      3      2 

What I want : count the number of rows that match the combinaison of keywords (AND operator).

Upvotes: 0

Views: 62

Answers (2)

Sotos
Sotos

Reputation: 51612

One way is to use stri_extract_all_regex to get all keywords. Then loop over that list combine to get pairs, unlist and use table to count, i.e.

library(stringi)

table(unlist(sapply(stri_extract_all_regex(dataset$dataset, paste(keywords, collapse = '|')),
                                                          function(i)combn(i, 2, toString))))

#female, young  male, female   male, young 
#            2             2             1 

Upvotes: 1

Limey
Limey

Reputation: 12585

From the online doc: "grepl returns a logical vector (match or not for each element of x)"

So

flags <- sapply(keyword, function(x) length(grepl(x, dataset$dataset, ignore.case = TRUE)))

will give you vectors of indicator variables showing which elements of dataset contain each keyword. Then you should be able to just & them together.

sum(flags$male & flags&female)

[Untested code]

To generate all combinations of two or more keywordfs, the combinations function from the arrangements package is useful.

library(arrangements)

keywords <- c("male", "female", "young", "old")
combos <- lapply(2:length(keywords), function(k) combinations(keywords, k))

Then you can just iterate through combos to get totals you want.

Upvotes: 0

Related Questions