Reputation: 3732
I have a string that looks like below, and I have to remove everything between the first bracket and the last bracket. All bets are off, on what's in between (regarding other brackets). What would be the best aproach, thanks.
'[
{ "foo":
{"bar":"foo",
"bar": {
["foo":"bar", "foo":"bar"]
}
}
}
],
"foo":"bar","foo":"bar"'
result:
',
"foo":"bar","foo":"bar"'
Upvotes: 3
Views: 300
Reputation: 21548
You could use something like Parslet to write a parser. Here's an example I wrote, based on the JSON grammer from http://www.json.org/
require 'parslet'
#This needs a few more 'as' calls to annotate the output
class JSONParser < Parslet::Parser
rule(:space) { match('[\s\n]').repeat(1)}
rule(:space?) { space.maybe }
rule(:digit) { match('[0-9]') }
rule(:hexdigit) { match('[0-9a-fA-F]') }
rule(:number) { space? >> str('-').maybe >>
(str('0') | (match('[1-9]') >> digit.repeat)) >>
(str('.') >> digit.repeat).maybe >>
((str('e')| str('E')) >> (str('+')|str('-')).maybe >> digit.repeat ).maybe }
rule(:escaped_character) { str('\\') >> (match('["\\\\/bfnrt]') | (str('u') >> hexdigit.repeat(4,4))) }
rule(:string) { space? >> str('"') >> (match('[^\"\\\\]') | escaped_character).repeat >> str('"') }
rule(:value) { space? >> (string | number | object | array | str('true') | str('false') | str('null')) }
rule(:pair) { string >> str(":") >> value }
rule(:pair_list) { pair >> (space? >> str(',') >> pair).repeat }
rule(:object) { str('{') >> space? >> pair_list.maybe >> space? >> str('}') }
rule(:value_list) { value >> (space? >> str(',') >> value).repeat }
rule(:array) { space? >> str('[') >> space? >> value_list.maybe >> space? >> str(']') >> space?}
rule(:json) { value.as('value') >> (space? >> str(',') >> value.as('value')).repeat }
root(:json)
end
# I've changed your doc to be a list of JSON values
doc = '[
{ "foo":
{"bar":"foo",
"bar": [
{"foo":"bar", "foo":"bar"}
]
}
}
],
{"foo":"bar"},{"foo":"bar"}'
puts JSONParser.new.parse(doc)[1..-1].map{|value| value["value"]}.join(",")
# => {"foo":"bar"},{"foo":"bar"}
However as your document isn't valid JSON (as far as I know).. then you can change the above...
require 'parslet'
class YourFileParser < Parslet::Parser
rule(:space) { match('[\s\n]').repeat(1)}
rule(:space?) { space.maybe }
rule(:digit) { match('[0-9]') }
rule(:hexdigit) { match('[0-9a-fA-F]') }
rule(:number) { space? >> str('-').maybe >>
(str('0') | (match('[1-9]') >> digit.repeat)) >>
(str('.') >> digit.repeat).maybe >>
((str('e')| str('E')) >> (str('+')|str('-')).maybe >> digit.repeat ).maybe }
rule(:escaped_character) { str('\\') >> (match('["\\\\/bfnrt]') | (str('u') >> hexdigit.repeat(4,4))) }
rule(:string) { space? >> str('"') >> (match('[^\"\\\\]') | escaped_character).repeat >> str('"') }
rule(:value) { space? >> (string | number | object | array | str('true') | str('false') | str('null')) }
rule(:pair) { string >> str(":") >> value }
rule(:pair_list) { (pair|value) >> (space? >> str(',') >> (pair|value)).repeat }
rule(:object) { str('{') >> space? >> pair_list.maybe >> space? >> str('}') }
rule(:value_list) { (pair|value) >> (space? >> str(',') >> (pair|value)).repeat }
rule(:array) { space? >> str('[') >> space? >> value_list.maybe >> space? >> str(']') >> space?}
rule(:yourdoc) { (pair|value).as('value') >> (space? >> str(',') >> (pair|value).as('value')).repeat }
root(:yourdoc)
end
doc = '[
{ "foo":
{"bar":"foo",
"bar": {
["foo":"bar", "foo":"bar"]
}
}
}
],
"foo":"bar","foo":"bar"'
puts YourFileParser.new.parse(doc)[1..-1].map{|value| value["value"]}.join(",")
Upvotes: 0
Reputation: 11076
It's difficult to tell what you're trying to achieve, but that looks like JSON to me so it would probably be much easier to parse it and then manipulate it that way.
Upvotes: 0
Reputation: 7212
Here you go:
string.gsub(/\[.*\]/m, '')
You need to use the m flag for the . to match newline characters. .* is already greedy, so it will match any number of brackets in between.
Upvotes: 0
Reputation: 434585
If your data really does look like that and you don't have an brackets in the bit at the end then:
s.gsub(/\[.*\]/m, '')
If you want to be a little more paranoid, then you can look for ],
followed by an end-of-line:
s.gsub(/\[.*\],$/m, ',')
Hard to say any more than that without a specification of your data format.
Upvotes: 1