Reputation: 5210
I need to parse some strings which contain paths to directories. The problem is that the contains escaped whitespaces and other escaped symbols. For example:
"/dir_1/dir_2/dir_3/dir/another/dest_dir\ P\&G/"
Note that there is a whitespace before P\&G/
.
Here is my treetop grammar(alpha_digit_special contains whitespace in the beginning)
rule alpha_digit_special
[ a-zA-Z0-9.+&\\]
end
rule path_without_quotes
([/] alpha_digit_special*)+
end
rule quot_mark
["]
end
rule path_with_quotes
quot_mark path_without_quotes quot_mark
end
rule path
path_with_quotes / path_without_quotes
end
I get nil
after parsing this string. So how can i specify the rule so that the string may contain escaped whitespaces?
Upvotes: 0
Views: 206
Reputation: 2606
You cannot use alpha_digit_special* to handle back-slash escaped spaces. Instead, you must use a repetition of character units, where a character unit is either a backslashed character pair, or a single non-backslash character. Something like this should work:
rule alpha_digit_special
[a-zA-Z0-9.+&\\]
end
rule path_character
'\\' (alpha_digit_special / ' ')
/
alpha_digit_special
end
rule path_without_quotes
([/] path_character* )+
end
Note that the above won't accept a backslashed character (that's not a space nor in the alpha_digit_special set). I think you can see how to change that though.
Upvotes: 1
Reputation: 14125
Did you try \s
?
test = "dest_dir P&G"
test.match(/[a-zA-Z0-9_\s\&]+/)
=> #<MatchData "dest_dir P&G">
Upvotes: 0