Reputation: 3167
I'm trying to implement this in a small ruby script, and tested it on http://www.rubular.com/, where it worked perfectly. Not sure why its not performing in the actual script.
The RegEx: /(motion|links|sound|button|symbol)|(0.\d{8})|(\s\d{1}\s)|(\d{10}\s)/
The Text it's Against:
Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732
Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043
etc. etc.
What I am trying to grab: Only the numbers, and the single word after "Trial Type". So for the first line of the example, I would only want " 1 motion 1 0.87913100 1302969732" to be returned. I also want to keep the space before the first number in each trial.
My short ruby script:
File.open('log.txt', 'r') do |file|
contents = file.readlines.to_s
regex = Regexp.new(/(motion|links|sound|button|symbol)|(0\.\d{8})|(\s\d{1}\s)|(\d{10}\s)/)
matchdata = regex.match(contents).to_a
matchdata.each do |match|
if match != nil
puts match
end
end
end
It only outputs two "1"s though. Hmm... I know its reading the file contents right, and when I tried an alternate simplet regex it worked fine.
Thanks for any help I get here!! : )
Upvotes: 1
Views: 130
Reputation: 160631
This is one of those times that trying to everything in a big regex makes you work too hard. Simplify things:
ary = [
'Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732',
'Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043'
]
ary.each do |li|
numbers = li.scan(/[\d.]+/)
trial_type = li[/Trial Type: (\w+)/, 1]
puts "%d %s %d %f %d\n" % [numbers.first, trial_type, *numbers[1 .. -1]]
end
# >> 1 motion 1 0.879131 1302969732
# >> 7 button 0 0.198178 1302987043
Regex patterns are powerful, but people think it's macho to do everything in one big line. You have to weigh doing that with the increased work necessary to put together the regex in the first place, plus maintain it if something changes in the text being parsed later.
Upvotes: 1
Reputation: 168269
If you know that the data follows a particular pattern, you can just follow that pattern in the regex, and pick up the portions you want with ( )
.
/Trial ID: (\d+) \| Trial Type: (\w+) \| Trick\? (\d+) Click Time: ([\.\d]+) ([\.\d]+)/
The more you know previously about the data, the more specifically you can make the regex. If you see some variations in the data, and the regex fails to match, then just relax the pattern:
[\.\d]+
instead of \d+
.[]+
\s*
or [ \t]*
.Trial ID:
part may appear as a different phrase, replace it with .*?
,and so on.
If you are not sure how many spaces/tabs appear, use this:
/Trial\s*ID:\s*(\d+)\s*\|\s*Trial\s*Type:\s*(\w+)\s*\|\s*Trick\?\s*(\d+)\s*Click\s*Time:\s*([\.\d]+)\s+([\.\d]+)/
Upvotes: 2
Reputation: 43097
You need to escape the literal pipes inside the regex, fill in other missing literals (like Trick, \?, Click\sTime:, remove some of the spaces, etc...), and insert regex spaces where appropriate... i.e.
regex = Regexp.new(/(motion|links|sound|button|symbol)\s\|\sTrick\?\s*\d\s*Click\s+Time:\s+(0\.\d{,8})\s(\d{10}))/)
EDIT: fixed parenthesis nesting in the original
Upvotes: 3
Reputation: 64177
You want to use String#scan
matchdata = contents.scan(regex)
Also @Mike Penington is correct, you shouldn't have to do the if match != nil
if you do it right. You have to clean up your regex as well. The pipe character in regex is a special character to denote match the left side OR the right side, and you have the litteral pipe character that you must escape.
Upvotes: 4