Martin
Martin

Reputation: 11

How to select only numbers with specific string in between

I wanna write a regex to select only actual GPS coordinates (not ranges) from the input (below).

This regex returns what I want BUT including the words, I want only numbers:

(actual (lat|lon) (\d+(.\d{1,6})))|((\d+(.\d{1,6})) (lat|lon))

So I want to exclude:

(actual (lat|lon) | (lat|lon))

How do I do that?

Input:

49.212087 latitude, 16.626133 longitude

lat range: 49.000000 to 50.000000 actual lat 49.212059 lon range: 16.000000 to 17.000000 actual lon 16.626276

49.21199 latitude, 16.626446 longitude

lat range: 49.000000 to 50.000000 actual lat 49.212073 lon range: 16.000000 to 17.000000 actual lon 16.626333

Upvotes: 0

Views: 68

Answers (4)

ΩmegaMan
ΩmegaMan

Reputation: 31656

This regex returns what I want BUT including the words, I want only numbers:

In the realm of regular expressions there is a difference between a match and a capture and basic grouping. You are telling it to match and capture thanks to the ( ) constructs.

Keep these items in mind.

  • Groups[0] is always the whole match
  • Groups[1-N] are individual captures when ( ) construct is specified.
  • Extract your data, the numbers you mentioned) only out of capture groups with index value > 0. Use only Groups[0] when you want just the full match.

([\d.]+)\s(\D+)

Using this pattern on your data, you can get these two matches

Match #0
          [0]:  49.212087 latitude, 
  ["1"] → [1]:  49.212087
  ["2"] → [2]:  latitude, 

Match #1
          [0]:  16.626133 longitude
  ["1"] → [1]:  16.626133
  ["2"] → [2]:  longitude

Named Captures

If one used named captures (?<{name here}), you can access the info via named groups such as mymatch.Groups["Data"].Value or mymatch.Groups[1].Value.


(?<Data>[\d.]+)\s(?<What>\D+)

Use of this pattern has these matches and group captures, which are indexable by int, but also via the quoted strings of "Data" and "What":

Match #0
             [0]:  49.212087 latitude, 
  ["Data"] → [1]:  49.212087
  ["What"] → [2]:  latitude, 

Match #1
             [0]:  16.626133 longitude
  ["Data"] → [1]:  16.626133
  ["What"] → [2]:  longitude

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626936

You have too many unnecessary groups. Also, since you actually need 2 groups to match the same type of value, you may use a named capturing group, and grab all your required matches with a regex like

actual (?:lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (?:lat|lon)

See the regex demo. If you use a RegexOptions.ExplicitCapture flag, you can use capturing groups as non-capturing ones (only the named capturing groups will keep their submatches). See the C# demo:

var s = "lat range: 49.000000 to 50.000000 actual lat 49.212059 lon range: 16.000000 to 17.000000 actual lon 16.626276";
var pattern = @"actual (lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (lat|lon)";
var results = Regex.Matches(s, pattern)
        .Cast<Match>()
        .Select(m => m.Groups["val"].Value)
        .ToList();
Console.WriteLine(string.Join("\n", results));
// => 49.212059
//    16.626276

If you put the (lon|lat) into a named capturing group, you will be able to get a dictionary as a result:

var pattern = @"actual (?<type>lat|lon) (?<val>\d+\.\d{1,6})|(?<val>\d+\.\d{1,6}) (?<type>lat|lon)";
var results = Regex.Matches(s, pattern)
     .Cast<Match>()
     .ToDictionary(
            m => m.Groups["type"].Value,
            m => m.Groups["val"].Value);
foreach (var kv in results)
    Console.WriteLine("'{0}': '{1}'", kv.Key, kv.Value);
// => 'lat': '49.212059'
//    'lon': '16.626276'

See another C# demo.

Upvotes: 1

Michael Lopez
Michael Lopez

Reputation: 153

If I've understud your query correctly, this regex should work for you:

(?<=(actual (lat|lon) ))(\d+(.\d{1,6}))|(?<!((lat|lon) range: ))(\d+(.\d{1,6}))(?=( (lat|lon)))

See also my test results on Regexstorm

You can learn more about lookback and lookaheads in this topic: Regex lookahead, lookbehind and atomic groups

Upvotes: 0

Victor Leontyev
Victor Leontyev

Reputation: 8736

Here is working regexp (link to test):

((?<=actual\s(lat|lon)\s)(\d+(.\d{1,6})))|((\d+(.\d{1,6}))(?=\s(lat|lon)))

You can find more information how it's working http://codeasp.net/blogs/microsoft-net/293/c-regex-extract-the-text-between-square-brackets-without-returning-the-brackets-themselves

Upvotes: 0

Related Questions