Ossir
Ossir

Reputation: 3145

How to strip single-line comments in obj-c properly

I know there are a lot of resources with regex for it. But I could not find the one I want. My problem is: I want to remove one line comments (//) from obj-c sources, but I don't want to break the code in it. For instance, with this regex: @"//.*" I can remove all comments, but it also corrupts string literal:

@"bsdv//sdfsdf"

I played with non-capturing parentheses (?:(\"*\")*+), but without success. Also I found this expression for Python:

r'(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)'

It should cover my case, but I've not figure out how to make it work with obj-c.

Please, help me to build proper regex.

UPDATE: Yeah, that's a tough one, I know there're a lot of caveats, other than the one I described. I would appreciate if someone post regex that only fix my issue. Anyway, I gonna post my solution, without regex soon, I hope it will be helpful for anyone who struggling with such problem too.

Upvotes: 0

Views: 187

Answers (1)

Stephan
Stephan

Reputation: 43053

Try this regex:

(?:^|.*;(?!.*")|#(?:define|endif|ifn?def|import|undef|...).*)\s*(//[^\r\n]+$)

Demo

http://regex101.com/r/jT4xC8

Description

Regular expression visualization

Discussion

Besides all the warnings expressed in the comments, I assume that a single line can appear in two distinct cases:

  • Case 1: Alone on its line preceded or not by blank chars
  • Case 2: Not Alone on its line preceded or not by blank chars, and other chars.

In the first case, we match the beginning of the line (^ with /m flag). Then we search zero or more blank chars (\s*) and finally the single line comment: //[$\r\n]+$.

In the second case, if there are other chars on the line, they form statements. Any statement is ended by a semicolon ;. So we search the last statement and its corresponding semicolon .*;(?!.*"). Then we search the single line comment. Those other chars can be also preprocessor statements. In this case, they are introduced by a sharp #.

One important keypoint is that I assume the code passed to the regex is a code that compiles.

There is more

Don't forget also to add some other pre-processor directives that may apply in your case. Check this SO answer: https://stackoverflow.com/a/18014883/363573

Upvotes: 2

Related Questions