sakow0
sakow0

Reputation: 33

Match line with specific string to extract values Python Regex

I am having some issues in finding the correct regex for this task, excuse me for my beginner skills. What I am trying to do is only get the id value from a line where its "available":true not "available":false. I am able to get the ID's of all lines through re.findall('"id":(\d{13})', line, re.DOTALL) (13 is to match exactly 13 digits, as there are other ids in the code with less than 13 digits which i dont need).

{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},

Therefore end results needs to be ['1651572973431','1351572943231']

Appreciate the great help thanks

Upvotes: 0

Views: 85

Answers (3)

user557597
user557597

Reputation:

This works to match what you want

(?<="id":)\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)

https://regex101.com/r/FseimH/1

Expanded

 (?<= "id": )
 \d{13} 
 (?=
      (?: ," [^"]* ": [^,]*? )*?
      ,"available":true
 )

Explained

 (?<= "id": )                        # Lookbehind assertion for id
 \d{13}                              # Consume 13 digit id
 (?=                                 # Lookahead assertion
      (?:                                 # Optional sequence
           ,                                   # comma
           " [^"]* "                           # quoted string
           :                                   # colon
           [^,]*?                              # optional non-comma's
      )*?                                 # End sequence, do 0 to many times - 
      ,"available":true                   # until we find  available = true
 )

Upvotes: 1

Mark
Mark

Reputation: 92440

This might not be a good answer — it depends on exactly what you have. It looks like you have a list of strings and you want the id's from some of them. If that's the case, it's going to be much cleaner and easier to read if you parse the JSON rather than writing an byzantine regex. For example:

import json

# lines is a list of strings:

lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]

# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]

result

[1351572943231, 1651572973431]

If the code you posted is one long string, you can wrap it in [] and then parse it as an array with the same result:

import json

line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'

lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]

Upvotes: 2

Emma
Emma

Reputation: 27723

Here, we can simply use the "id" as a left boundary, and collect the desired numbers in a capturing group:

"id":([0-9]+)

enter image description here

Then, we can continue add boundaries to it. For example, if 13 digits are desired, we can simply:

\"id\":([0-9]{13})

Upvotes: 1

Related Questions