Reputation: 7069
I am trying to return the index's to all occurrences of a specific character in a string using Ruby. A example string is "a#asg#sdfg#d##"
and the expected return is [1,5,10,12,13]
when searching for #
characters. The following code does the job but there must be a simpler way of doing this?
def occurances (line)
index = 0
all_index = []
line.each_byte do |x|
if x == '#'[0] then
all_index << index
end
index += 1
end
all_index
end
Upvotes: 25
Views: 20093
Reputation: 121
Here's a solution for massive strings. I'm doing text finds on 4.5MB text strings and the other solutions grind to a halt. This takes advantage of the fact that ruby .split is very efficient compared to string comparisions.
def indices_of_matches(str, target)
cuts = (str + (target.hash.to_s.gsub(target,''))).split(target)[0..-2]
indicies = []
loc = 0
cuts.each do |cut|
loc = loc + cut.size
indicies << loc
loc = loc + target.size
end
return indicies
end
It's basically using the horsepower behind the .split method, then using the separate parts and the length of the searched string to work out locations. I've gone from 30 seconds using various methods to instantaneous on extremely large strings.
I'm sure there's a better way to do it, but:
(str + (target.hash.to_s.gsub(target,'')))
adds something to the end of the string in case the target is at the end (and the way split works), but have to also make sure that the "random" addition doesn't contain the target itself.
indices_of_matches("a#asg#sdfg#d##","#")
=> [1, 5, 10, 12, 13]
Upvotes: 1
Reputation: 7069
Another solution derived from FMc's answer:
s = "a#asg#sdfg#d##"
q = []
s.length.times {|i| q << i if s[i,1] == '#'}
I love that Ruby never has only one way of doing something!
Upvotes: 1
Reputation: 247210
Here's a long method chain:
"a#asg#sdfg#d##".
each_char.
each_with_index.
inject([]) do |indices, (char, idx)|
indices << idx if char == "#"
indices
end
# => [1, 5, 10, 12, 13]
requires 1.8.7+
Upvotes: 3
Reputation: 42421
s = "a#asg#sdfg#d##"
a = (0 ... s.length).find_all { |i| s[i,1] == '#' }
Upvotes: 29
Reputation: 15498
Here's a less-fancy way:
i = -1
all = []
while i = x.index('#',i+1)
all << i
end
all
In a quick speed test this was about 3.3x faster than FM's find_all method, and about 2.5x faster than sepp2k's enum_for method.
Upvotes: 17
Reputation: 370445
require 'enumerator' # Needed in 1.8.6 only
"1#3#a#".enum_for(:scan,/#/).map { Regexp.last_match.begin(0) }
#=> [1, 3, 5]
ETA: This works by creating an Enumerator that uses scan(/#/)
as its each method.
scan yields each occurence of the specified pattern (in this case /#/
) and inside the block you can call Regexp.last_match to access the MatchData object for the match.
MatchData#begin(0)
returns the index where the match begins and since we used map on the enumerator, we get an array of those indices back.
Upvotes: 19