Reputation: 1068
I simply want to count the occurrences of a string, e.g. 'xy'
, in another string, e.g. 'kxyloixyea'
, without using any extra libraries.
There are many questions and answers for columns and data frames, and that is probably why I cannot find an answer to this simplest basic question. (Here is one of the many related posts for which this post was unjustifiably marked as duplicate: How to calculate the number of occurrence of a given character in each row of a column of strings? This again relates to data frames and vectors, so I see no answer that fits my question of "string in string".)
I came up with this probably way too complicated solution:
lengths(gregexpr(str_to_count, str_to_search, fixed = TRUE))
# as e.g.:
lengths(gregexpr('xy', 'kxyloixyea', fixed = TRUE))
# correctly returns 2
This works fine for my purposes, but I can't imagine that there isn't a simpler method (like e.g. 'kxyloixyea'.count('xy')
in Python); but I just can't find it.
Also, FYI, this does not work when there is zero occurrence, then it again returns 1. In my specific function this never happens, but still it would be good to see a solution that covers that too (without additional complexities).
(Note: fixed = TRUE
is no accident, I don't want regex.)
Here is another solution:
str_to_count = 'xy'
str_to_search = 'kxyloixyea'
lengths(strsplit(str_to_search, str_to_count, fixed = TRUE)) - 1
This works with no occurrence, but it does not work when the str_to_search
is empty (""
). Also, it doesn't look much better than the one above.
And here is a modified version that deals with empty str_to_search
:
lengths(strsplit(paste0(str_to_search, str_to_count), str_to_count, fixed = TRUE)) - 1
Once again, seems nonsensically long for such a simple question.
Upvotes: 1
Views: 383
Reputation: 1134
The following code works well for every case you described (with no trick like '-1' needed):
str_to_count = "xy" #setting variables
str_to_search = "kxyloixyea"
lengths(regmatches(str_to_search, gregexpr(str_to_count, str_to_search, fixed = TRUE)))
Upvotes: 1