Reputation: 33
I am having some issues in finding the correct regex for this task, excuse me for my beginner skills. What I am trying to do is only get the id value from a line where its "available":true not "available":false. I am able to get the ID's of all lines through re.findall('"id":(\d{13})', line, re.DOTALL)
(13 is to match exactly 13 digits, as there are other ids in the code with less than 13 digits which i dont need).
{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
Therefore end results needs to be ['1651572973431','1351572943231']
Appreciate the great help thanks
Upvotes: 0
Views: 85
Reputation:
This works to match what you want
(?<="id":)\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)
https://regex101.com/r/FseimH/1
Expanded
(?<= "id": )
\d{13}
(?=
(?: ," [^"]* ": [^,]*? )*?
,"available":true
)
Explained
(?<= "id": ) # Lookbehind assertion for id
\d{13} # Consume 13 digit id
(?= # Lookahead assertion
(?: # Optional sequence
, # comma
" [^"]* " # quoted string
: # colon
[^,]*? # optional non-comma's
)*? # End sequence, do 0 to many times -
,"available":true # until we find available = true
)
Upvotes: 1
Reputation: 92440
This might not be a good answer — it depends on exactly what you have. It looks like you have a list of strings and you want the id's from some of them. If that's the case, it's going to be much cleaner and easier to read if you parse the JSON rather than writing an byzantine regex. For example:
import json
# lines is a list of strings:
lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]
# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]
result
[1351572943231, 1651572973431]
If the code you posted is one long string, you can wrap it in []
and then parse it as an array with the same result:
import json
line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'
lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]
Upvotes: 2
Reputation: 27723
Here, we can simply use the "id" as a left boundary, and collect the desired numbers in a capturing group:
"id":([0-9]+)
Then, we can continue add boundaries to it. For example, if 13 digits are desired, we can simply:
\"id\":([0-9]{13})
Upvotes: 1