roman
roman

Reputation: 5210

How to parse parse directory path containing whitespaces and escaped symbols using treetop?

I need to parse some strings which contain paths to directories. The problem is that the contains escaped whitespaces and other escaped symbols. For example:

"/dir_1/dir_2/dir_3/dir/another/dest_dir\ P\&G/"

Note that there is a whitespace before P\&G/.

Here is my treetop grammar(alpha_digit_special contains whitespace in the beginning)

rule alpha_digit_special
  [ a-zA-Z0-9.+&\\]
end

rule path_without_quotes
  ([/] alpha_digit_special*)+ 
end

rule quot_mark
  ["]
end

rule path_with_quotes
  quot_mark path_without_quotes quot_mark
end

rule path
  path_with_quotes / path_without_quotes
end

I get nil after parsing this string. So how can i specify the rule so that the string may contain escaped whitespaces?

Upvotes: 0

Views: 206

Answers (2)

cliffordheath
cliffordheath

Reputation: 2606

You cannot use alpha_digit_special* to handle back-slash escaped spaces. Instead, you must use a repetition of character units, where a character unit is either a backslashed character pair, or a single non-backslash character. Something like this should work:

rule alpha_digit_special
  [a-zA-Z0-9.+&\\]
end

rule path_character
  '\\' (alpha_digit_special / ' ')
  /
  alpha_digit_special
end

rule path_without_quotes
  ([/] path_character* )+ 
end

Note that the above won't accept a backslashed character (that's not a space nor in the alpha_digit_special set). I think you can see how to change that though.

Upvotes: 1

deadrunk
deadrunk

Reputation: 14125

Did you try \s?

test = "dest_dir P&G" 
test.match(/[a-zA-Z0-9_\s\&]+/)
 => #<MatchData "dest_dir P&G">

Upvotes: 0

Related Questions