A Jackson
A Jackson

Reputation: 2856

What does this Regular expression do?

I'm in the processing of converting a program from Perl to Java. I have come across the line

my ($title) = ($info{$host} =~ /^\s*\(([^\)]+)\)\s*$/);

I'm not very good with regular expressions but from what I can tell this is matching something in the string $info{$host} to the regular expression ^\s*(([^)]+))\s*$ and assigning the match to $title.

My problem is that I have no clue what the regular expression is doing and what it will match. Any help would be appreciated.

Thanks

Upvotes: 2

Views: 306

Answers (4)

Sinan Ünür
Sinan Ünür

Reputation: 118128

my ($title) = ($info{$host} =~ /^\s*\(([^\)]+)\)\s*$/);

First, m// in list context returns the captured matches. my ($title) puts the right hand side in list context. Second, $info{$host} is matched against the following pattern:

/^ \s* \( ( [^\)]+) \) \s* $/x

Yes, used the x flag so I could insert some spaces. ^\s* skips any leading whitespace. Then we have an escaped paranthesis (therefore no capture group is created. Then we have a capture group containing [^\)]. That character class can be better written as [^)] because the right parenthesis is not special in a character class and means anything but a left parenthesis.

If there are one or more characters other than a closing parenthesis following the opening parenthesis followed by a closing parenthesis optionally surrounded on either side by whitespace, that sequence of characters is captured and put in to $x.

Upvotes: 0

James Anderson
James Anderson

Reputation: 27478

Ok step by step

/ - quote the regex

^ - the begining of the string

\s* - zero or more of any spacelike character

( - an actual ( character

( - begin a capture group

[^)]+ any of the characters ^ or ) the + indicating at least one

) -end the capture group

) and actual ) character

\s* zero or more space like characters

$ - the end of the string

/ - close the regex quote

So as far as I can work out we are looking for strings like " (^) " or "())" methinks I am missing something here.

Upvotes: 1

Alex
Alex

Reputation:

It will match a bunch of leading whitespace, followed by a left paren, followed by some text not including a right paren, followed by a right paren, followed by some more whitespace.

Matches:

      (some stuff)  

Fails:

 (some stuff

     some stuff)

   (some stuff)  asadsad

Upvotes: 4

Konrad Rudolph
Konrad Rudolph

Reputation: 545588

The regular expression matches a string that contains exactly one pair of matching parentheses (actually, one opening and one matching closing parenthesis, but inside any number of further opening parentheses may occur).

The string may begin and end with whitespace characters, but no others. Inside the parantheses, however, arbitrary characters may occur (at least one).

The following strings should match it:

 (abc)
 (()
   (ab)

By the way, you may simply use the regular expression as-is in Java (after escaping the backslashes), using the Pattern class.

Upvotes: 4

Related Questions