Reputation: 2513
I have dataset :
dataset <- c("male Neque porro quisquam est qui dolorem ipsum quia dolor sit amet female consectetur, adipisci velit young",
"est qui dolorem tall dolorem ipsum female Neque young",
"male, female porro old")
dataset <- as.data.frame(dataset)
I have a keywords list :
keywords <- c("male", "female", "young")
I can count the number of rows containing "words":
sapply(keywords, function(x) length(grep(x, dataset$dataset, ignore.case = TRUE)))
My result :
male female young
3 3 2
What I want : count the number of rows that match the combinaison of keywords (AND operator).
Upvotes: 0
Views: 62
Reputation: 51612
One way is to use stri_extract_all_regex
to get all keywords
. Then loop over that list combine to get pairs, unlist
and use table
to count, i.e.
library(stringi)
table(unlist(sapply(stri_extract_all_regex(dataset$dataset, paste(keywords, collapse = '|')),
function(i)combn(i, 2, toString))))
#female, young male, female male, young
# 2 2 1
Upvotes: 1
Reputation: 12585
From the online doc: "grepl
returns a logical vector (match or not for each element of x)"
So
flags <- sapply(keyword, function(x) length(grepl(x, dataset$dataset, ignore.case = TRUE)))
will give you vectors of indicator variables showing which elements of dataset
contain each keyword. Then you should be able to just &
them together.
sum(flags$male & flags&female)
[Untested code]
To generate all combinations of two or more keywordfs, the combinations
function from the arrangements
package is useful.
library(arrangements)
keywords <- c("male", "female", "young", "old")
combos <- lapply(2:length(keywords), function(k) combinations(keywords, k))
Then you can just iterate through combos
to get totals you want.
Upvotes: 0