Extracting numbers till a certain paragraph using multi condition Regex in python

Question

I am new to regex. I am trying to extract the data from log files and each files has text like this:

crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02
Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st
Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers
Swap: 456K total, 30897564k used, 785431k free, 23445897k cached

PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

I am extracting only digit values till the word cached. For this i am buliding different patterns for each digit and then extracting values in a list using finditer. My code till now:

[x.group()for x in re.finditer(r"(\d{2}:\d{2}:\d{2})|(\d+\.\d+?)%id"), text]

This is a fragment of regex where i have to specify pattern for every digit like suffix and prefix string. Is there a more efficient way to take output?

desired_values=[00:00:00, 200, 23:35, 0, 0.04, 0.05, 0.02 , 
               300, 2, 298, 0, 0, 
               12.0, 2.5, 0.0, 89.2, 0.0, 0.1, 0.0, 
               123456, 1234567, 989991, 11156793, 
               9234456, 30897564, 785431, 23445897]

These values then i insert in database, that's why they should be in list.

Wiktor Stribiżew · Accepted Answer

You may use

r'(?s)(?


See the regex demo
Details

(? - no digit immediately to the left is allowed

(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+) - either of

\d{2}:\d{2}(?::\d{2})? - 2 digits, :, 2 digits and then an optional sequence of : and 2 digits
| - or
\d*\.?\d+ - 0+ digits, an optional . and then 1+ digits


(?!\d) - no digit immediately to the right is allowed
(?=.*\bcached\b) - there must be a word cached somewhere to the right of the current location.

Python demo:
import re
text = r"""crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02
Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st
Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers
Swap: 456K total, 30897564k used, 785431k free, 23445897k cached
 
PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND"""
print( re.findall(r'(?

Output:
['00:00:00', '200', '23:35', '0', '0.04', '0.05', '0.02', '300', '2', '298', '0', '0', '12.0', '2.5', '0.0', '89.2', '0.0', '0.1', '0.0', '123456', '1234567', '989991', '11156793', '456', '30897564', '785431', '23445897']

Extracting numbers till a certain paragraph using multi condition Regex in python

Answers (1)

Related Questions