Ryan Poolos
Ryan Poolos

Reputation: 18551

In what ways can I improve this regular expression?

I have written this regex that works, but honestly, it’s like 75% guesswork.

The goal is this: I have lots of imports in Xcode, like so:

#import <UIKit/UIKit.h>
#import "NSString+MultilineFontSize.h"

and I only want to return the categories that contain +. There are also lots of lines of code throughout the source which include + in other contexts.

Right now, this returns all of the proper lines throughout the Xcode project. But if there is one thing I’ve learned from googling and searching Stack Overflow for regex tutorials, it is that there are LOTS of different ways to do things. I’d love to see all of the different ways you guys can come up with that make it either more efficient or more bulletproof regarding potential spoofs or misses.

^\#import+.[\"]*+.(?:(?!\+).)*+.*[\"]

Thanks in advance for all of your help.

Update

Also I suppose I’ll accept the answer of whoever does this with the shortest string, without missing any possible spoofs. But again, thanks to everyone who participates in this learning experience.

Resources from answers

This is an awesome resource for practicing regex from Dan Rasmussen: RegExr

Upvotes: 2

Views: 118

Answers (2)

alinsoar
alinsoar

Reputation: 15793

sed 's:^#import \(.*[+].*\):\1:' FILE

will display

"NSString+MultilineFontSize.h"

for your sample.

Upvotes: 0

dlras2
dlras2

Reputation: 8486

The first thing I notice is that your + characters are misplaced: t+. matches t one or more times, followed by a single character .. I'm assuming you wanted to match the end of import, followed by one or more of any character: import.+

Secondly, # doesn't need to be escaped.

Here's what I came up with: ^#import\s+(.*\+.*)$

\s+ matches one or more whitespace character, so you're guaranteed that the line actually starts with #import and not #importbutnotreally or anything else.

I'm not familiar with xcode syntax, but the following part of the expression, (.*\+.*), simply matches any string with a + character somewhere in it. This means invalid imports may be matched, but I'm working under the assumption your trying to match valid code. If not, this will need to be modified to validate the importer syntax as well.


P.S. To test your expression, try RegExr. You can hover over characters to check what they do.

Upvotes: 3

Related Questions