Reputation: 29287
I've two vectors:
c('abc', 'asdf', 'werd', 'ffssd')
c('ab', 'd', 'w')
I'd like to vectorize the following for-loop:
for(p in 1 : length(patterns)){
count <- count + str_count(texts, p);
}
I used the following commands but both won't work.
> str_count(texts, patterns)
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern, :
longer object length is not a multiple of shorter object length
> str_count(texts, t(patterns))
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern, :
longer object length is not a multiple of shorter object length
I'd want a 2d matrix like this:
| patterns
------+--------
| 1 0 0
texts | 0 1 0
| 0 1 1
| 0 1 0
Upvotes: 5
Views: 300
Reputation: 19970
You can use outer
. I assume you are using str_count
from the stringr
package.
library(stringr)
texts <- c('abc', 'asdf', 'werd', 'ffssd')
patterns <- c('ab', 'd', 'w')
matches <- outer(texts, patterns, str_count)
# set dim names
colnames(matches) <- patterns
rownames(matches) <- texts
matches
ab d w
abc 1 0 0
asdf 0 1 0
werd 0 1 1
ffssd 0 1 0
EDIT
# or set names directly within 'outer' as noted by @RichardScriven
outer(setNames(nm = texts), setNames(nm = patterns), str_count)
Upvotes: 9
Reputation: 24955
Using dplyr
and tidyr
(and stringr
):
library(dplyr)
library(tidyr)
library(stringr)
expand.grid(texts, patterns) %>%
mutate_each(funs(as.character(.))) %>%
mutate(matches = stringr::str_count(Var1, Var2)) %>%
spread(Var2, matches)
Var1 ab d w
1 abc 1 0 0
2 asdf 0 1 0
3 ffssd 0 1 0
4 werd 0 1 1
Upvotes: 3