dt1000
dt1000

Reputation: 3732

Ruby parse through string

I have a string that looks like below, and I have to remove everything between the first bracket and the last bracket. All bets are off, on what's in between (regarding other brackets). What would be the best aproach, thanks.

'[

        { "foo":
            {"bar":"foo",
                "bar": {
                    ["foo":"bar", "foo":"bar"]
                }
            }
        }

    ],

"foo":"bar","foo":"bar"'

result:

  ',

    "foo":"bar","foo":"bar"'

Upvotes: 3

Views: 300

Answers (5)

Nigel Thorne
Nigel Thorne

Reputation: 21548

You could use something like Parslet to write a parser. Here's an example I wrote, based on the JSON grammer from http://www.json.org/

require 'parslet'

#This needs a few more 'as' calls to annotate the output 
class JSONParser < Parslet::Parser
  rule(:space)              { match('[\s\n]').repeat(1)}
  rule(:space?)             { space.maybe }
  rule(:digit)              { match('[0-9]') }
  rule(:hexdigit)           { match('[0-9a-fA-F]') }

  rule(:number)             { space? >> str('-').maybe >> 
                                (str('0') | (match('[1-9]') >> digit.repeat)) >> 
                                (str('.') >> digit.repeat).maybe >> 
                                ((str('e')| str('E')) >> (str('+')|str('-')).maybe >> digit.repeat ).maybe }

  rule(:escaped_character)  { str('\\') >> (match('["\\\\/bfnrt]') | (str('u') >> hexdigit.repeat(4,4))) }
  rule(:string)             { space? >> str('"') >> (match('[^\"\\\\]') | escaped_character).repeat >> str('"') }
  rule(:value)              { space? >> (string | number | object | array | str('true') | str('false') | str('null')) }

  rule(:pair)               { string >> str(":") >> value }
  rule(:pair_list)          { pair >> (space? >> str(',') >> pair).repeat }
  rule(:object)             { str('{') >> space? >> pair_list.maybe >> space? >> str('}') }

  rule(:value_list)         { value >> (space? >> str(',') >> value).repeat }
  rule(:array)              { space? >> str('[') >> space? >> value_list.maybe >> space? >> str(']') >> space?}

  rule(:json)               { value.as('value') >> (space? >> str(',') >> value.as('value')).repeat }
  root(:json)
end

# I've changed your doc to be a list of JSON values
doc = '[

        { "foo":
            {"bar":"foo",
                "bar": [
                    {"foo":"bar", "foo":"bar"}
                ]
            }
        }

    ],

{"foo":"bar"},{"foo":"bar"}'

puts JSONParser.new.parse(doc)[1..-1].map{|value| value["value"]}.join(",")
# => {"foo":"bar"},{"foo":"bar"} 

However as your document isn't valid JSON (as far as I know).. then you can change the above...

require 'parslet'

class YourFileParser < Parslet::Parser
  rule(:space)              { match('[\s\n]').repeat(1)}
  rule(:space?)             { space.maybe }
  rule(:digit)              { match('[0-9]') }
  rule(:hexdigit)           { match('[0-9a-fA-F]') }

  rule(:number)             { space? >> str('-').maybe >> 
                                (str('0') | (match('[1-9]') >> digit.repeat)) >> 
                                (str('.') >> digit.repeat).maybe >> 
                                ((str('e')| str('E')) >> (str('+')|str('-')).maybe >> digit.repeat ).maybe }

  rule(:escaped_character)  { str('\\') >> (match('["\\\\/bfnrt]') | (str('u') >> hexdigit.repeat(4,4))) }
  rule(:string)             { space? >> str('"') >> (match('[^\"\\\\]') | escaped_character).repeat >> str('"') }
  rule(:value)              { space? >> (string | number | object | array | str('true') | str('false') | str('null')) }

  rule(:pair)               { string >> str(":") >> value }
  rule(:pair_list)          { (pair|value) >> (space? >> str(',') >> (pair|value)).repeat }
  rule(:object)             { str('{') >> space? >> pair_list.maybe >> space? >> str('}') }

  rule(:value_list)         { (pair|value) >> (space? >> str(',') >> (pair|value)).repeat }
  rule(:array)              { space? >> str('[') >> space? >> value_list.maybe >> space? >> str(']') >> space?}

  rule(:yourdoc)           { (pair|value).as('value') >> (space? >> str(',') >> (pair|value).as('value')).repeat }
  root(:yourdoc)
end

doc = '[

        { "foo":
            {"bar":"foo",
                "bar": {
                    ["foo":"bar", "foo":"bar"]
                }
            }
        }

    ],

"foo":"bar","foo":"bar"'

puts YourFileParser.new.parse(doc)[1..-1].map{|value| value["value"]}.join(",")

Upvotes: 0

Tilo
Tilo

Reputation: 33732

you need multi-line mode:

str.gsub(/\[.*\]/m, '')

Upvotes: 0

Andy Waite
Andy Waite

Reputation: 11076

It's difficult to tell what you're trying to achieve, but that looks like JSON to me so it would probably be much easier to parse it and then manipulate it that way.

Upvotes: 0

psyho
psyho

Reputation: 7212

Here you go:

string.gsub(/\[.*\]/m, '')

You need to use the m flag for the . to match newline characters. .* is already greedy, so it will match any number of brackets in between.

Upvotes: 0

mu is too short
mu is too short

Reputation: 434585

If your data really does look like that and you don't have an brackets in the bit at the end then:

s.gsub(/\[.*\]/m, '')

If you want to be a little more paranoid, then you can look for ], followed by an end-of-line:

s.gsub(/\[.*\],$/m, ',')

Hard to say any more than that without a specification of your data format.

Upvotes: 1

Related Questions