Reputation: 28591

Regex to match entries between slashes, but not slashes - including empty entries

I'm trying to use a regexp to match entries between the slashes in the text below:

311102Z/5663.00N/00813.02E/GPS//03/-/

For this example, the results should be a series of matches which have the content:

311102Z
5663.00N
00813.02E
GPS
(an empty string)
03
-

It is important that we catch the empty entry and return an empty match. Unfortunately, for various reasons, we can't use grouping here, or match the slashes themselves and split on those.

I have the following regex as something that is almost working: (.*?)(?=/). An interactive display of this regex can be seen here. It matches all the entries fine, but has extra empty matches at the end of each entry.

I tried replacing the * with a +, but of course that meant that it didn't match the blank entry.

Does anyone have any ideas what I could do to make it match the way I want to - ie. without these extra empty matches, but with the empty entry in the position where there are no characters between the slashes.

If it matters for compatibility, I'm using this regex in Python.

Upvotes: 2

Answers (5)

Cary Swoveland

Reputation: 110755

One more (Python):

(?<=/)(?=/)|[^/]+

Start your engine!

(?<=/)  : use a positive lookbehind to assert match is preceded by '/'
(?=/)   : use a positive lookahead to assert match is followed by '/'
|       : or
[^/]+   : match 1+ characters other than '/'

Change [^/]+ to [^/\n]+ to prevent matches from spanning line terminators.

Upvotes: 1

anubhava

Reputation: 786091

You may use this regex with lookahead and lookbehind assertions:

(?:(?<=/)|^)[^/]*(?=/)

RegEx Demo

Code:

>>> import re
>>> s = '311002Z/3623.00N/00412.02E/GPS//03/-/'
>>> print (re.findall(r'(?:(?<=/)|^)[^/]*(?=/)', s))
['311002Z', '3623.00N', '00412.02E', 'GPS', '', '03', '-']

RegEx Details:

(?:(?<=/)|^): Lookbehind to assert that we have either start or / at previous position
[^/]*: Match 0 or more of any character that is not /
(?=/): Lookahead to assert that we have a / ahead

Upvotes: 4

Ruzihm

Reputation: 20269

You can use re.split for this (same as str.split, only using a regex), then removing the last item:

>>> import re
>>> foo = "311102Z/5663.00N/00813.02E/GPS//03/-/"

>>> re.split("/", foo)[:1]
['311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']

Upvotes: 0

The fourth bird

Reputation: 163632

As an alternative, you could match 1+ times any char except /, asserting a / on the right.

Or get the position between 2 forward slashes.

[^/]+(?=/)|(?<=/)(?=/)

Explanation

[^/]+(?=/) Match 1+ times any char except / and assert a / at the right
| Or
(?<=/)(?=/) Get the position between 2 forward slashes

Regex demo | Python demo

Example code

import re
 
s="311102Z/5663.00N/00813.02E/GPS//03/-/"
pattern = r"[^/]+(?=/)|(?<=/)(?=/)"
print(re.findall(pattern, s))

Output

'311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']

Upvotes: 3

Peter H.

Reputation: 2164

Then i would suggest:

import re
 
entry = "311102Z/5663.00N/00813.02E/GPS//03/-/" 
  
match = re.findall("([^/]*)/", entry)  
print(match)

which returns

['311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']

Upvotes: 0

Regex to match entries between slashes, but not slashes - including empty entries

Answers (5)

Related Questions