Reputation: 151
I'm writing my first Python re code and i have some doubt regarding the Regular Expression. I have a variable which contains
a = "'PoE Port Info','1 up medium Auto Class Searching 0 0.0 0 0.0','10 up low User defined 4(W) Searching - 0.0 0 0.0'"
I need to extract:
'1 up medium Auto Class Searching 0 0.0 0 0.0'
and convert the string to list of string with all whitespaces removed
['1','up','medium','Auto Class','Searching','0','0.0','0','0.0']
similarly
'10 up low User defined 4(W) Searching - 0.0 0 0.0'
remove the whitespaces and convert to list
['10','up','low','User defined,'4(W)','Searching','-','0.0','0','0.0']
all the other remaining data in that string should not match.
My code:
a = "'PoE Port Info','1 up medium Auto Class Searching 0 0.0 0 0.0','10 up low User defined 4(W) Searching - 0.0 0 0.0'"
b= []
#split string with ,
a = a.split(",")
print(a)
for item in a:
c = re.findall(r"[0-9]+[\s+[a-z]+]*[0-9]+",item, re.I)
if c:
#Replace whitespace character to spaces
temp = re.sub(r'[\s]+',' ', c[0])
#print(temp)
b.append(temp.split(" "))
print(b)
This code is working but i'm facing issue at Regular expression. My current output:
[['1', 'up', 'medium', 'Auto', 'Class', 'Searching', '0'], ['10', 'up', 'low', 'User', 'defined', '4']]
Some please help me.
How to write the RE?
Upvotes: 1
Views: 644
Reputation: 626747
If you know there is no need to check for nested/escaped single quotes, you may simply extract all the fields with a single regex that will capture all the fields in one go. The problem is with the fourth field that may contain any (amount of) chars, so you need to spell out the patterns for the other fields.
Here is an example regex:
import re
a = "'PoE Port Info','1 up medium Auto Class Searching 0 0.0 0 0.0','10 up low User defined 4(W) Searching - 0.0 0 0.0'"
res = re.findall(r"(?!^)'(\d+)\s+(\w+)\s+(\w+)\s+([^']*?)\s+(\S+)\s+(\S+)\s+([\d.]+)\s+(\d+)\s+([\d.]+)'", a)
print(res)
# => [
# ('1', 'up', 'medium', 'Auto Class', 'Searching', '0', '0.0', '0', '0.0'),
# ('10', 'up', 'low', 'User defined 4(W)', 'Searching', '-', '0.0', '0', '0.0')
# ]
See the regex demo and the Python demo
Details
(?!^)'
- a '
not at the start of the string(\d+)
- Group 1: one or more digits\s+
- 1+ whitespaces(\w+)\s+
- Group 2: one or more word chars and then 1+ whitespaces(\w+)\s+
- Group 3: one or more word chars and then 1+ whitespaces([^']*?)\s+
- Group 4: any 0 or more chars other than '
as few as possible, then 1+ whitespaces(\S+)\s+
- Group 5: any 1+ non-whitespaces, then 1+ whitespaces(\S+)\s+
- Group 6: any 1+ non-whitespaces, then 1+ whitespaces([\d.]+)\s+
- Group 7: any 1+ digits/dots, then 1+ whitespaces(\d+)\s+
- Group 8: any 1+ digits, then 1+ whitespaces([\d.]+)
- Group 9: any 1+ digits/dots'
- a '
char.Upvotes: 1
Reputation: 1368
I suggest you use an online regex editor like regex 101 and test your teststring there - it is much easier to create a valid syntax there and in the match information window you also see which group is what.
I would probably try to create a set of x groups (one for each value you need). No need for split and other operations afterwards.
Upvotes: 1