Rez99
Rez99

Reputation: 389

Vectorized str_locate not working as intended

I have the following data frame:

df <- data.frame(string=c('abcde', 'cde'))

I want to find the end position of "de" in each string, which I can determine like so:

df %>% 
 rowwise() %>%
 mutate(pos=str_locate(string = string, pattern = "de")[2])

##   string    pos
##    abcde      5
##      cde      3

This is the intended output but I don't wish to use rowwise() because it is very slow for large data frames.

So I tried to vectorize my function and remove the rowwise() command:

Vstr_locate <- Vectorize(str_locate)

df %>% 
 #rowwise() %>%
 mutate(pos=Vstr_locate(string = string, pattern = "de")[2])

But that didn't work:

##   string    pos
##    abcde      5
##      cde      5

Questions:

Upvotes: 2

Views: 462

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146020

str_locate is already vectorized, no rowwise or Vectorize needed:

df %>% mutate(pos=str_locate(string = string, pattern = "de")[, 2])
#   string pos
# 1  abcde   5
# 2    cde   3

Upvotes: 4

Greg
Greg

Reputation: 3670

You need a comma in the brackets

df %>% 
  #rowwise() %>%
  mutate(pos=Vstr_locate(string = string, pattern = "de")[2,])
  string pos
1  abcde   5
2    cde   3

Look at the outputs of the two functions

str_locate(string = "abcde", pattern = "de")
     start end
[1,]     4   5

vs.

Vstr_locate(string = "abcde", pattern = "de")
     abcde
[1,]     4
[2,]     5

Similarly, if you apply each over a list

library(purrr)
strings <- c('abcde', 'cde')
map(strings, str_locate, "de")
[[1]]
     start end
[1,]     4   5

[[2]]
     start end
[1,]     2   3

vs.

map(strings, Vstr_locate, "de")
[[1]]
     abcde
[1,]     4
[2,]     5

[[2]]
     cde
[1,]   2
[2,]   3

The element you want is indexed as [2,] with Vstr_locate

Upvotes: 2

Related Questions