Naman
Naman

Reputation: 2669

Write regex for matching all 4 digit numbers between patterns

I am trying to write a regex to find pattern in string. Its gonna have a word 'LAT_LON' then some non word characters and then many 4 digit numbers and after then some alphabet or end of string.

Eg1. 

SOME EXAMPLE STRING 12334...
LAT_LON .... 1234 5678 9012 1234 
1234 1234 

Eg2.
SOME EXAMPLE STRING 1234...
LAT_LON ... 1234   5678 9012 1234 
1234 1234 SOMETHING_ELSE

In both the examples I need those 6 4-digit numbers after the pattern 'LAT_LON' and before any other alphabet.

EDIT: I am working in python, although I don't care much about the language. I am fairly new to regex world. So I am just trying some random stuff, nothing very conclusive at all till now.

Upvotes: 0

Views: 587

Answers (2)

user557597
user557597

Reputation:

One way is to capture the numbers then split on whitespace.
LAT_LON[^\da-zA-Z]*(\d{4}(?:\s+\d{4})*)

Then split capture group 1 on whitespace.

 LAT_LON [^\da-zA-Z]* 
 (                             # (1 start)
      \d{4} 
      (?:
           \s+ 
           \d{4} 
      )*
 )                             # (1 end)

Here is a more verbose formatted version.
( Regex's constructed by RegexFormat 6 )

 LAT_LON                # Exact 'LAT_LON'

 [^\da-zA-Z]*           # Optinal chars, 0 to many times
                        # not digit nor letter (case insensitive)

 (                      # (1 start), Capture all 4 digit numbers
      \d{4}                  # Single 4 digit number

      (?:                    # Cluster group
           \s+                    # Whitespace(s)
           \d{4}                  # Single 4 digit number
      )*                     # End Cluster, do 0 to many times
 )                      # (1 end)

Upvotes: 4

ShellFish
ShellFish

Reputation: 4551

Let me try it another way, just to have some variation in the answers. I'm going to use for the job.

awk '/LAT_LON/,/\n[^0-9]/{printf gensub(/[^0-9 ]/, "", "g", $0) " "}' /path/to/intput/file

With a possible pipe to clean up the output | tr -s ' '.

This code just searches for lines containing LAT_LON, then it will parse each of those lines until a non number is found. On these lines we filter out non spaces or numbers using the gensub.

Note that the regex is fairly simple because we have filtered out all irrelevant parts. A simple non-numerical removal does the job here. See also if you want to mess around with , in my opinion it's the best way to learn. In particular , which supports an enhanced regex language!

Upvotes: 3

Related Questions