Mike Farmer
Mike Farmer

Reputation: 2992

How do I parse YAML with nil values?

I apologize for the very specific issue I'm posting here but I hope it will help others that may also run across this issue. I have a string that is being formatted to the following:

[[,action1,,],[action2],[]]

I would like to translate this to valid YAML so that it can be parsed which would look like this:

[['','acton1','',''],['action2'],['']]

I've tried a bunch of regular expressions to accomplish this but I'm afraid that I'm at a complete loss. I'm ok with running multiple expressions if needed. For example (ruby):

puts s.gsub!(/,/,"','")  # => [[','action1','',']','[action2]','[]]
puts s.gsub!(/\[',/, "['',") # => [['','action1','',']','[action2]','[]]

That's getting there, but I have a feeling I'm starting to go down a rat-hole with this approach. Is there a better way to accomplish this?

Thanks for the help!

Upvotes: 3

Views: 2388

Answers (3)

Brad Gilbert
Brad Gilbert

Reputation: 34120

It would be easier to just parse it, then output valid YAML.


Since I don't know Ruby, Here is an example in Perl.


Since you only want a subset of YAML, that appears to be similar to JSON, I used the JSON module.

I've been wanting an excuse to use Regexp::Grammars, so I used it to parse the data.

I guarantee it will work, no matter how deep the arrays are.

#! /usr/bin/env perl
use strict;
#use warnings;
use 5.010;
#use YAML;
use JSON;
use Regexp::Grammars;


my $str = '[[,action1,,],[action2],[],[,],[,[],]]';

my $parser = qr{
  <match=Array>

  <token: Text>
    [^,\[\]]*

  <token: Element>
  (?:
    <.Text>
  |
    <MATCH=Array>
  )

  <token: Array>
  \[
     (?:
       (?{ $MATCH = [qw'']; })
     |
       <[MATCH=Element]>   ** (,)
     )
  \]
}x;


if( $str =~ $parser ){
  say to_json $/{match};
}else{
  die $@ if $@;
}

Which outputs.

[["","action1","",""],["action2"],[],["",""],["",[],""]]

If you really wanted YAML, just un comment "use YAML;", and replace to_json() with Dump()

---
-
  - ''
  - action1
  - ''
  - ''
-
  - action2
- []
-
  - ''
  - ''
-
  - ''
  - []
  - ''

Upvotes: 3

Alan Moore
Alan Moore

Reputation: 75232

Try this:

s.gsub(/([\[,])(?=[,\]])/, "\\1''")
 .gsub(/([\[,])(?=[^'\[])|([^\]'])(?=[,\]])/, "\\+'");

EDIT: I'm not sure about the replacement syntax. That's supposed to be group #1 in the first gsub, and the highest-numbered participating group -- $+ -- in the second.

Upvotes: 1

Inshallah
Inshallah

Reputation: 4814

This does the job for the empty fields (ruby1.9):

s.gsub(/(?<=[\[,])(?=[,\]])/, "''")

Or for ruby1.8, which doesn't support zero-width look-behind:

s.gsub(/([\[,])(?=[,\]])/, "\\1''")

Quoting non-empty fields can be done with one of these:

s.gsub(/(?<=[\[,])\b|\b(?=[,\]])/, "'")
s.gsub(/(\w+)/, "'\\1'")

In the above I'm making use of zero-width positive look behind and zero-width positive look ahead assertions (the '(?<=' and '(?=').

I've looked for some ruby specific documentation but could not find anything that explains these features in particular. Instead, please let me refer you to perlre.

Upvotes: 4

Related Questions