Mini John
Mini John

Reputation: 7941

CRUD text using Regex - Ruby

I'm searching for a Solution to Parse and Update a Document. A very good example is user.js scripts.

Example Case:

A user uploads a user.js script to userscripts.org. The file must have a head of specific variables for the Browser. e.g.:

// ==UserScript==
// @name        Fancy Title
// @description Fancey Description
// @namespace   http://example.com
// @icon        http://example.com/icon.png
// @updateURL   http://example.com/user.js
// @downloadURL http://example.com/user.js
// @homepageURL http://example.com
// @require     https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js
// @include     http*://example.com
// @include     http://example.com/scripts/*
// @include     http://example.com/tags/*
// @grant       GM_getValue
// @grant       GM_setValue
// @grant       GM_listValues
// @version     1.0
// ==/UserScript==

What would be a good solution to back check those variables and modify or add them to the document. Basically importing the variables from @testing.title => @name Fancy Title and vice versa.

Let's say that if the Meta Head didn't contain the variables @udpateURL and @downloadeURL, I would add them respectively.

My first guess was to regex scan the document with (@\w+), that will get me all the variables @ but from there i'm lost :)

Can i solve this with plain ruby or is there handy gem available?

Edit:

Sam pointed out: \/\/\s*@(\w+)\s+(.*) Which captures exactly the variables i need..

The identifier(@title) and the value(Fancy Title).

How do i set, read or update them tho ?


@MrYoshiji provided me with a very awesome regex Meta Reader:

raw_metas = file_content.scan( /\A\/\/\s==UserScript==(\w|\W)*\/\/\s==\/UserScript==/ )
metas = {}
raw_metas.split(/\r\n|\n|\r/).each do |line_with_meta|
  attribute_name = line_with_data.scan(/@\w+/)
  value = line_with_data.sub("// #{attribute_name}", '').strip
  if metas[attribute_name.sub('@', '').to_sym].present?
    metas[attribute_name.sub('@', '').to_sym] = [ metas[attribute_name.sub('@', '').to_sym], value].flatten
  else
    metas[attribute_name.sub('@', '').to_sym] = value
  end
end

But i'm completely lost on how to set this up to interact with my Model's attributes.


What Meta Data i need to Change

Meaning that those attributes (:description etc) are stored in my Model and i need to pass them.

// @name => @model.name


// @description => @model.description
// @namespace   => Application root_path

// @updateURL   => @model show_view url
// @downloadURL => @Model show_view url
// @homepageURL => Application root_path

// @include     => Custom url (passed by me)
// @include     => Custom url (passed by me)
// @include     => Custom url (passed by me)

// @version     => @model.version

Upvotes: 2

Views: 364

Answers (1)

zx81
zx81

Reputation: 41838

[EDIT: In a chat, you mentioned that your input may be on a single line. This second demo shows a regex to deal with that, and also the general procedure to rebuild the string.]

This code stores the names and values in two hashes, replaces the @version with 2.0, then outputs them (see the output at the bottom of the online demo):

subject = <<-eos
@name        Fancy Title
// @description Fancey Description
// @namespace   http://example.com
// @icon        http://example.com/icon.png
// @updateURL   http://example.com/user.js
// @downloadURL http://example.com/user.js
// @homepageURL http://example.com
// @require     https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js
// @include     http*://example.com
// @include     http://example.com/scripts/*
// @include     http://example.com/tags/*
// @grant       GM_getValue
// @grant       GM_setValue
// @grant       GM_listValues
// @version     1.0
eos

regex = /\/\/ (@\w+)\s*([^\n]*)/

# put captures in two hashes
tokens = Hash.new
values = Hash.new
counter = 0
subject.scan(regex) {|m|
    tokens[counter] = $1
    values[counter] = $2
    counter += 1
}

# find hash key for @version
versionkey = tokens.index("@version")
# change version to 2.0
values[versionkey] = "2.0"

# print names and values
i=0
while i < counter  do
   puts "#{tokens[i]} : #{values[i]}"
   i +=1
end

The key is that token names are captured to Group 1, and token values are captured to Group 2 (see the regex explanation below). We build hashes with the values in these two groups.

To manipulate values, you have several options:

  1. Use regex with gsub to replace lines in your string (not recommended)

  2. Directly manipulate values in the hashes to your heart's content, as shown in the demo, where @version is changed to 2.0, then rebuild the string if needed. That's what I would do.

Explain Regex

//                       # '// '
(                        # group and capture to \1:
  @                      #   '@'
  \w+                    #   word characters (a-z, A-Z, 0-9, _) (1 or
                         #   more times (matching the most amount
                         #   possible))
)                        # end of \1
\s*                      # whitespace (\n, \r, \t, \f, and " ") (0 or
                         # more times (matching the most amount
                         # possible))
(                        # group and capture to \2:
  [^\n]*                 #   any character except: '\n' (newline) (0
                         #   or more times (matching the most amount
                         #   possible))
)                        # end of \2

Upvotes: 1

Related Questions