Reputation: 11
I wanna write a regex to select only actual GPS coordinates (not ranges) from the input (below).
This regex returns what I want BUT including the words, I want only numbers:
(actual (lat|lon) (\d+(.\d{1,6})))|((\d+(.\d{1,6})) (lat|lon))
So I want to exclude:
(actual (lat|lon) | (lat|lon))
How do I do that?
Input:
49.212087 latitude, 16.626133 longitude
lat range: 49.000000 to 50.000000 actual lat 49.212059 lon range: 16.000000 to 17.000000 actual lon 16.626276
49.21199 latitude, 16.626446 longitude
lat range: 49.000000 to 50.000000 actual lat 49.212073 lon range: 16.000000 to 17.000000 actual lon 16.626333
Upvotes: 0
Views: 68
Reputation: 31656
This regex returns what I want BUT including the words, I want only numbers:
In the realm of regular expressions there is a difference between a match and a capture and basic grouping. You are telling it to match and capture thanks to the ( )
constructs.
Keep these items in mind.
Groups[0]
is always the whole matchGroups[1-N]
are individual captures when ( )
construct is specified.Groups[0]
when you want just the full match.([\d.]+)\s(\D+)
Using this pattern on your data, you can get these two matches
Match #0
[0]: 49.212087 latitude,
["1"] → [1]: 49.212087
["2"] → [2]: latitude,
Match #1
[0]: 16.626133 longitude
["1"] → [1]: 16.626133
["2"] → [2]: longitude
Named Captures
If one used named captures (?<{name here})
, you can access the info via named groups such as mymatch.Groups["Data"].Value
or mymatch.Groups[1].Value
.
(?<Data>[\d.]+)\s(?<What>\D+)
Use of this pattern has these matches and group captures, which are indexable by int, but also via the quoted strings of "Data" and "What":
Match #0
[0]: 49.212087 latitude,
["Data"] → [1]: 49.212087
["What"] → [2]: latitude,
Match #1
[0]: 16.626133 longitude
["Data"] → [1]: 16.626133
["What"] → [2]: longitude
Upvotes: 0
Reputation: 626936
You have too many unnecessary groups. Also, since you actually need 2 groups to match the same type of value, you may use a named capturing group, and grab all your required matches with a regex like
actual (?:lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (?:lat|lon)
See the regex demo. If you use a RegexOptions.ExplicitCapture
flag, you can use capturing groups as non-capturing ones (only the named capturing groups will keep their submatches). See the C# demo:
var s = "lat range: 49.000000 to 50.000000 actual lat 49.212059 lon range: 16.000000 to 17.000000 actual lon 16.626276";
var pattern = @"actual (lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (lat|lon)";
var results = Regex.Matches(s, pattern)
.Cast<Match>()
.Select(m => m.Groups["val"].Value)
.ToList();
Console.WriteLine(string.Join("\n", results));
// => 49.212059
// 16.626276
If you put the (lon|lat)
into a named capturing group, you will be able to get a dictionary as a result:
var pattern = @"actual (?<type>lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (?<type>lat|lon)";
var results = Regex.Matches(s, pattern)
.Cast<Match>()
.ToDictionary(
m => m.Groups["type"].Value,
m => m.Groups["val"].Value);
foreach (var kv in results)
Console.WriteLine("'{0}': '{1}'", kv.Key, kv.Value);
// => 'lat': '49.212059'
// 'lon': '16.626276'
See another C# demo.
Upvotes: 1
Reputation: 153
If I've understud your query correctly, this regex should work for you:
(?<=(actual (lat|lon) ))(\d+(.\d{1,6}))|(?<!((lat|lon) range: ))(\d+(.\d{1,6}))(?=( (lat|lon)))
See also my test results on Regexstorm
You can learn more about lookback and lookaheads in this topic: Regex lookahead, lookbehind and atomic groups
Upvotes: 0
Reputation: 8736
Here is working regexp (link to test):
((?<=actual\s(lat|lon)\s)(\d+(.\d{1,6})))|((\d+(.\d{1,6}))(?=\s(lat|lon)))
You can find more information how it's working http://codeasp.net/blogs/microsoft-net/293/c-regex-extract-the-text-between-square-brackets-without-returning-the-brackets-themselves
Upvotes: 0