gaspar
gaspar

Reputation: 1068

How to count string occurrences in another string in base R?

I simply want to count the occurrences of a string, e.g. 'xy', in another string, e.g. 'kxyloixyea', without using any extra libraries.

There are many questions and answers for columns and data frames, and that is probably why I cannot find an answer to this simplest basic question. (Here is one of the many related posts for which this post was unjustifiably marked as duplicate: How to calculate the number of occurrence of a given character in each row of a column of strings? This again relates to data frames and vectors, so I see no answer that fits my question of "string in string".)

I came up with this probably way too complicated solution:

lengths(gregexpr(str_to_count, str_to_search, fixed = TRUE))
# as e.g.:
lengths(gregexpr('xy', 'kxyloixyea', fixed = TRUE))
# correctly returns 2

This works fine for my purposes, but I can't imagine that there isn't a simpler method (like e.g. 'kxyloixyea'.count('xy') in Python); but I just can't find it.

Also, FYI, this does not work when there is zero occurrence, then it again returns 1. In my specific function this never happens, but still it would be good to see a solution that covers that too (without additional complexities).

(Note: fixed = TRUE is no accident, I don't want regex.)


Here is another solution:

str_to_count = 'xy'
str_to_search = 'kxyloixyea'
lengths(strsplit(str_to_search, str_to_count, fixed = TRUE)) - 1

This works with no occurrence, but it does not work when the str_to_search is empty (""). Also, it doesn't look much better than the one above.

And here is a modified version that deals with empty str_to_search:

lengths(strsplit(paste0(str_to_search, str_to_count), str_to_count, fixed = TRUE)) - 1

Once again, seems nonsensically long for such a simple question.

Upvotes: 1

Views: 383

Answers (1)

Fernando Barbosa
Fernando Barbosa

Reputation: 1134

The following code works well for every case you described (with no trick like '-1' needed):

str_to_count = "xy"           #setting variables
str_to_search = "kxyloixyea"

lengths(regmatches(str_to_search, gregexpr(str_to_count, str_to_search, fixed = TRUE)))

Upvotes: 1

Related Questions