Reputation: 151
I've got a string like this:
<block trace="true" name="AssignResources: Append Resources">
I need to get the word (or the characters to next whitespace) after <
(in this case block) and the words before =
(here
trace and name).
I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block
.
I'm sure it's not that hard, but I've not found the solution yet.
Anybody's got a hint?
Thanks.
Btw: I want to replace the pattern matches with gsub
.
EDIT:
Solved it with following regexes:
1)
/\s(\w+)="(.*?)"/
matches all attr and their values in $1 and $2.
2)
/<!--.*-->/
matches comments
3)
/<([\/|!|\?]?)([A-Za-z0-9]+)[^\s|>|\/]*/
matches all tag names, wheter they're in a closing tag, self closing tag, <?xml>
-tag or DTD-tag. $1
includes optional prefixed / ! or ?
or nothing and $2
contains the tagname
Upvotes: 0
Views: 3943
Reputation: 31428
Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:
>> m,r=0,["<blockie ", " tracie=", " namie="]
>> s.gsub(/<.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "<blockie tracie="true" namie="AssignResources: Append Resources">"
Upvotes: 0
Reputation: 59451
<block trace="true" name="AssignResources: Append Resources">
<([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*>
#result:
$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources
Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.
str = '<block trace="true" name="AssignResources: Append Resources">'
repl = str.gsub(/<([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*>/,
"tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl
Upvotes: 0
Reputation: 123831
Its looks so much like parsing HTML with regex to me
Ruby has very good html parser called Nokogiri
And Here is howto for that
require 'nokogiri'
html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')
html.xpath("//*").each do |s|
puts s.node_name #block
puts s.keys #trace, name
puts s.values #true, AssignResources: Append Resources
end
Upvotes: 2
Reputation: 370162
'<block trace="true" name="AssignResources: Append Resources">'[/<(\w+)/, 1]
#=> "block"
If you pass a regex and an index i to String#[]
, it'll return the value of the ith capturing group.
Edit:
In 1.9 you can use /(?<=<)\w+/
to require the presence of the <
without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:
"lo<la li".gsub(/(<)(\w+)/, '\1 --\2--')
#=> "lo< --la-- li"
Upvotes: 0