Sebastian
Sebastian

Reputation: 151

Ruby Regex matching string before and after certain characters

I've got a string like this:

<block trace="true" name="AssignResources: Append Resources">

I need to get the word (or the characters to next whitespace) after < (in this case block) and the words before = (here trace and name).

I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block.

I'm sure it's not that hard, but I've not found the solution yet.

Anybody's got a hint?
Thanks.

Btw: I want to replace the pattern matches with gsub.

EDIT:

Solved it with following regexes:

1) /\s(\w+)="(.*?)"/ matches all attr and their values in $1 and $2.

2) /<!--.*-->/ matches comments

3) /&lt;([\/|!|\?]?)([A-Za-z0-9]+)[^\s|&gt;|\/]*/ matches all tag names, wheter they're in a closing tag, self closing tag, <?xml>-tag or DTD-tag. $1 includes optional prefixed / ! or ? or nothing and $2 contains the tagname

Upvotes: 0

Views: 3943

Answers (5)

Jonas Elfstr&#246;m
Jonas Elfstr&#246;m

Reputation: 31428

Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"

Upvotes: 0

Amarghosh
Amarghosh

Reputation: 59451

&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl

Upvotes: 0

YOU
YOU

Reputation: 123831

Its looks so much like parsing HTML with regex to me

Ruby has very good html parser called Nokogiri

And Here is howto for that

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end

Upvotes: 2

sepp2k
sepp2k

Reputation: 370162

'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[], it'll return the value of the ith capturing group.

Edit:

In 1.9 you can use /(?<=&lt;)\w+/ to require the presence of the &lt; without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"

Upvotes: 0

codaddict
codaddict

Reputation: 455030

You can try:

&lt;([^ ]*)\s([^=]*)=

Upvotes: 1

Related Questions