frogatto
frogatto

Reputation: 29287

Vectorization of a for-loop in R

I've two vectors:

I'd like to vectorize the following for-loop:

for(p in 1 : length(patterns)){
    count <- count + str_count(texts, p);
}

I used the following commands but both won't work.

> str_count(texts, patterns)
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

> str_count(texts, t(patterns))
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

I'd want a 2d matrix like this:

       |  patterns
 ------+--------
       |   1 0 0
 texts |   0 1 0
       |   0 1 1
       |   0 1 0

Upvotes: 5

Views: 300

Answers (2)

cdeterman
cdeterman

Reputation: 19970

You can use outer. I assume you are using str_count from the stringr package.

library(stringr)

texts <- c('abc', 'asdf', 'werd', 'ffssd')
patterns <- c('ab', 'd', 'w')

matches <- outer(texts, patterns, str_count)

# set dim names
colnames(matches) <- patterns
rownames(matches) <- texts
matches
      ab d w
abc    1 0 0
asdf   0 1 0
werd   0 1 1
ffssd  0 1 0

EDIT

# or set names directly within 'outer' as noted by @RichardScriven
outer(setNames(nm = texts), setNames(nm = patterns), str_count)

Upvotes: 9

jeremycg
jeremycg

Reputation: 24955

Using dplyr and tidyr (and stringr):

library(dplyr)
library(tidyr)
library(stringr)
expand.grid(texts, patterns) %>%
   mutate_each(funs(as.character(.))) %>%
   mutate(matches = stringr::str_count(Var1, Var2)) %>% 
   spread(Var2, matches)
   Var1 ab d w
1   abc  1 0 0
2  asdf  0 1 0
3 ffssd  0 1 0
4  werd  0 1 1

Upvotes: 3

Related Questions