Reputation: 2745
I want to make an array of results from a string like this one, using a regular expression:
results|foofoofoo\nresults|barbarbarbar\nresults|googoogoo\ntimestamps||friday
Here’s my regex as it stands. It works in Sublime Text’s regex search but not in Ruby:
(results)\|.*?\\n(?=((results\|)|(timestamps\|\|)))
and this would be the desired result:
1. results|foofoofoo
2. results|barbarbar
3. results|googoogoo
Instead I’m getting these weird returns, and I can’t understand it. Why does this not select the result lines?
Match 1
1. results
2. results|
3. results|
4.
Match 2
1. results
2. results|
3. results|
4.
Match 3
1. results
2. timestamps||
3.
4. timestamps||
Here’s the actual code using the regex:
#create new lines for each regex'd line body with that body set as the raw attribute
host_scan.raw.scan(/(?:results)\|.*?\\n(?=((?:results\|)|(?:timestamps\|\|)))/).each do |body|
@lines << Line.new({:raw => body})
end
Upvotes: 0
Views: 147
Reputation: 2745
The answer turned out to lie in the parentheses. Wrapping in parentheses caused it to return the entire match instead of just the tail delimiter.
host_scan.raw.scan(/((?:results\|.*?\\n)(?=(?:results\|)|(?:timestamps\|\|)))/).each do |body|
@lines << Line.new({:raw => body})
end
Upvotes: 0
Reputation: 160551
Rather than jump to a regex, which is a much more complicated way to get at the data, use split("\n")
.
text = "results|foofoofoo\nresults|barbarbarbar\nresults|googoogoo\ntimestamps||friday"
ary = text.split("\n")
ary
is:
[
"results|foofoofoo",
"results|barbarbarbar",
"results|googoogoo",
"timestamps||friday"
]
Slice that and you can get:
ary[0..2]
=> ["results|foofoofoo", "results|barbarbarbar", "results|googoogoo"]
EDIT:
Based on the comment that there are more carriage returns and complex characters in the strings:
require 'awesome_print'
text = "results|foofoofoo\nmorefoo\nandevenmorefoo\nresults|barbarbarbar\nandmorebar\nandyetagainmorebar\nresults|googoogoo\ntimestamps||friday"
ap text.sub(/\|\|friday$/, '').split('results')[1..-1].map{ |l| 'results' << l }
Which outputs:
[
[0] "results|foofoofoo\nmorefoo\nandevenmorefoo\n",
[1] "results|barbarbarbar\nandmorebar\nandyetagainmorebar\n",
[2] "results|googoogoo\ntimestamps"
]
Upvotes: 0
Reputation: 3413
As Kendall Frey already stated, you are creating too many capture groups. No need to group the first literal “results|”, and no need to group the elements of your alternate group in individual non backreferencing groups. What you are intending to do is this regex:
/results\|.*?(?=\\n(?:results\||timestamps\|\|))/
or, if you don’t mind repeating the \\n
part, you can do away with the non-capturing subgroup:
/results\|.*?(?=\\nresults\||\\ntimestamps\|\|)/
– both will return an array of matched values as specified in your question.
Upvotes: 1
Reputation: 44326
I'm guessing it has something to do with capturing groups. If you change all your (...)
to (?:...)
it will eliminate capturing groups.
Upvotes: 0