RyanScottLewis
RyanScottLewis

Reputation: 14046

Ruby regex for finding comments?

I've been at this all day and I can't figure it out. I have some Ruby code in a string below and would only only like to match lines with code on them and the first comment for the code if it exists.

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan" # Setting an attribute # Another ignored comment

And this would capture:

    1. "1 + 1"
    2. "Simple math"
    1. "user = User.new"
    2. nil
    1. "user.name = "Ryan"
    2. "Setting an attribute"

I'm using /^\x20*(.+)\x20*(#\x20*.+\x20*){1}$/ to match against each line but it doesn't seem to work for all code.

Upvotes: 3

Views: 2402

Answers (2)

wyattisimo
wyattisimo

Reputation: 2486

Kobi's answer partially works, but does not match lines of code that lack a comment at the end.

It will also fail when it encounters string interpolation, e.g.:

str = "My name is #{first_name} #{last_name}" # first comment

...will be erroneously matched as: str = "My name is #{first_name}

You need a more comprehensive regex. Here's one idea:

/^[\t ]*([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*)(#([^#\r\n]*))?/
  • ^[\t ]* - Leading whitespace.
  • ([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*) - Matches a line of code.
    Breakdown:
    • [^#"'\r\n] - the first character in a line of code, and...
    • "(\\"|[^"])*" - a double-quoted string, or...
    • '(\\'|[^'])*' - a single-quoted string, or...
    • [^#\n\r] - any other character outside a quoted string that is not a # or line ending.
  • (#([^#\r\n]*))? - Matches first comment at the end of a line of code, if any.

Due to the more complex logic, this will capture 6 subpatterns for each match. Subpattern 1 is the code, subpattern 6 is the comment, and you can ignore the others.

Given the following block of code:

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan #{last_name}" # Setting an attribute # Another ignored comment

The above regex would produce the following (I excluded subpatterns 2, 3, 4, 5 for brevity):


  1. 1. 1 + 1
    6. Simple math (this comment would be collected)

  2. 1. user = User.new
    6.

  3. 1. user.name = "Ryan #{last_name}"
    6. Setting an attribute

Demo: http://rubular.com/r/yKxEazjNPC

Upvotes: 5

Kobi
Kobi

Reputation: 138117

While the underlying problem is quite difficult, you can find what you need here using the pattern:

^[\t ]*[^\s#][^#\n\r]*#([^#\n\r]*)

Which reads:

  • [\t ]* - leading spaces.
  • [^\s#] - one actual character. This should match the code.
  • [^#\n\r]* - Characters until the # sign. Anything besides hash or newlines.
  • #([^#\n\r]*) - The "first" comment, captured in group 1.

Working example: http://rubular.com/r/wNJTMDV9Bw

Upvotes: 2

Related Questions