Tony Nesavich
Tony Nesavich

Reputation: 23

Regex to find everything in between

I have the following regex which works when there is no leading /d,"There is 1 interface on the system:

or a trailing ",2017-01-...

Here is the regex:

(?m)(?<_KEY_1>\w+[^:]+?):\s(?<_VAL_1>[^\r\n]+)$

Here is a sample of what I am trying to parse:

1,"There is 1 interface on the system:
    Name               : Mobile Broadband Connection
    Description        : Qualcomm Gobi 2000 HS-USB Mobile Broadband Device 250F
    GUID               : {1234567-12CD-1BC1-A012-C1A1234CBE12}
    Physical Address   : 00:a0:c6:00:00:00
    State              : Connected
    Device type        : Mobile Broadband device is embedded in the system
    Cellular class     : CDMA
    Device Id          : A1000001234f67
    Manufacturer       : Qualcomm Incorporated
    Model              : Qualcomm Gobi 2000
    Firmware Version   : 09010091
    Provider Name      : Verizon Wireless
    Roaming            : Not roaming
    Signal             : 67%",2017-01-20T16:00:07.000-0700

I am trying to extract field names where for example Cellular class would equal CDMA but for all fields beginning after:

1,"There is 1 interface on the system:  (where 1 increments 1,2 3,4 and so on

and before the tailing ",2017-01....

Any help is much appreciated!

Upvotes: 0

Views: 89

Answers (3)

SamWhan
SamWhan

Reputation: 8332

You haven't responded to my comments or any of the answers, but here is my answer - try

^\s*(?<_KEY_1>[\w\s]+?)\s*:\s*(?<_VAL_1>[^\r\n"]+).*$

See it here at regex101.

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89567

Your example string seems to be a record from a csv file. This is how I will accomplish the task with Python (2.7 or 3.x):

import csv

with open('file.csv', 'r') as fh:
    reader = csv.reader(fh)
    results = []

    for fields in reader:
        lines = fields[1].splitlines()
        keyvals = [list(map(str.strip, line.split(':', 1))) for line in lines[1:]]
        results.append(keyvals)

    print(results)

It can be done in a similar way with other languages.

Upvotes: 0

trincot
trincot

Reputation: 350310

You could use look-ahead to ensure that the strings you match come before a ",\d sequence, and do not include a ". The latter would ensure you will only match between double quotes, of which the second has the pattern ",\d:

/^\h*(?<_KEY_1>[\w\h]+?)\h*:\h*(?<_VAL_1>[^\r\n"]+)(?="|$)(?=[^"]*",\d)/gm

See it on regex101

NB: I put the g and m modifiers at the end, but if your environment requires them at the start with (?m) notation, that will work too of course.

Upvotes: 1

Related Questions