Reputation: 23
I have the following regex which works when there is no leading /d,"There is 1 interface on the system:
or a trailing ",2017-01-...
Here is the regex:
(?m)(?<_KEY_1>\w+[^:]+?):\s(?<_VAL_1>[^\r\n]+)$
Here is a sample of what I am trying to parse:
1,"There is 1 interface on the system:
Name : Mobile Broadband Connection
Description : Qualcomm Gobi 2000 HS-USB Mobile Broadband Device 250F
GUID : {1234567-12CD-1BC1-A012-C1A1234CBE12}
Physical Address : 00:a0:c6:00:00:00
State : Connected
Device type : Mobile Broadband device is embedded in the system
Cellular class : CDMA
Device Id : A1000001234f67
Manufacturer : Qualcomm Incorporated
Model : Qualcomm Gobi 2000
Firmware Version : 09010091
Provider Name : Verizon Wireless
Roaming : Not roaming
Signal : 67%",2017-01-20T16:00:07.000-0700
I am trying to extract field names where for example Cellular class would equal CDMA but for all fields beginning after:
1,"There is 1 interface on the system: (where 1 increments 1,2 3,4 and so on
and before the tailing ",2017-01....
Any help is much appreciated!
Upvotes: 0
Views: 89
Reputation: 8332
You haven't responded to my comments or any of the answers, but here is my answer - try
^\s*(?<_KEY_1>[\w\s]+?)\s*:\s*(?<_VAL_1>[^\r\n"]+).*$
Upvotes: 0
Reputation: 89567
Your example string seems to be a record from a csv file. This is how I will accomplish the task with Python (2.7 or 3.x):
import csv
with open('file.csv', 'r') as fh:
reader = csv.reader(fh)
results = []
for fields in reader:
lines = fields[1].splitlines()
keyvals = [list(map(str.strip, line.split(':', 1))) for line in lines[1:]]
results.append(keyvals)
print(results)
It can be done in a similar way with other languages.
Upvotes: 0
Reputation: 350310
You could use look-ahead to ensure that the strings you match come before a ",\d
sequence, and do not include a "
. The latter would ensure you will only match between double quotes, of which the second has the pattern ",\d
:
/^\h*(?<_KEY_1>[\w\h]+?)\h*:\h*(?<_VAL_1>[^\r\n"]+)(?="|$)(?=[^"]*",\d)/gm
See it on regex101
NB: I put the g
and m
modifiers at the end, but if your environment requires them at the start with (?m)
notation, that will work too of course.
Upvotes: 1