Reputation: 28511
I'm trying to use a regexp to match entries between the slashes in the text below:
311102Z/5663.00N/00813.02E/GPS//03/-/
For this example, the results should be a series of matches which have the content:
311102Z
5663.00N
00813.02E
GPS
03
-
It is important that we catch the empty entry and return an empty match. Unfortunately, for various reasons, we can't use grouping here, or match the slashes themselves and split on those.
I have the following regex as something that is almost working: (.*?)(?=/)
. An interactive display of this regex can be seen here. It matches all the entries fine, but has extra empty matches at the end of each entry.
I tried replacing the *
with a +
, but of course that meant that it didn't match the blank entry.
Does anyone have any ideas what I could do to make it match the way I want to - ie. without these extra empty matches, but with the empty entry in the position where there are no characters between the slashes.
If it matters for compatibility, I'm using this regex in Python.
Upvotes: 2
Views: 234
Reputation: 110675
One more (Python):
(?<=/)(?=/)|[^/]+
(?<=/) : use a positive lookbehind to assert match is preceded by '/'
(?=/) : use a positive lookahead to assert match is followed by '/'
| : or
[^/]+ : match 1+ characters other than '/'
Change [^/]+
to [^/\n]+
to prevent matches from spanning line terminators.
Upvotes: 1
Reputation: 784958
You may use this regex with lookahead and lookbehind assertions:
(?:(?<=/)|^)[^/]*(?=/)
Code:
>>> import re
>>> s = '311002Z/3623.00N/00412.02E/GPS//03/-/'
>>> print (re.findall(r'(?:(?<=/)|^)[^/]*(?=/)', s))
['311002Z', '3623.00N', '00412.02E', 'GPS', '', '03', '-']
RegEx Details:
(?:(?<=/)|^)
: Lookbehind to assert that we have either start or /
at previous position[^/]*
: Match 0 or more of any character that is not /
(?=/)
: Lookahead to assert that we have a /
aheadUpvotes: 4
Reputation: 20249
You can use re.split
for this (same as str.split
, only using a regex), then removing the last item:
>>> import re
>>> foo = "311102Z/5663.00N/00813.02E/GPS//03/-/"
>>> re.split("/", foo)[:1]
['311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']
Upvotes: 0
Reputation: 163207
As an alternative, you could match 1+ times any char except /
, asserting a /
on the right.
Or get the position between 2 forward slashes.
[^/]+(?=/)|(?<=/)(?=/)
Explanation
[^/]+(?=/)
Match 1+ times any char except /
and assert a /
at the right|
Or(?<=/)(?=/)
Get the position between 2 forward slashesExample code
import re
s="311102Z/5663.00N/00813.02E/GPS//03/-/"
pattern = r"[^/]+(?=/)|(?<=/)(?=/)"
print(re.findall(pattern, s))
Output
'311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']
Upvotes: 3
Reputation: 2164
Then i would suggest:
import re
entry = "311102Z/5663.00N/00813.02E/GPS//03/-/"
match = re.findall("([^/]*)/", entry)
print(match)
which returns
['311102Z', '5663.00N', '00813.02E', 'GPS', '', '03', '-']
Upvotes: 0