Reputation: 179
I am new to regex. I am trying to extract the data from log files and each files has text like this:
crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02
Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st
Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers
Swap: 456K total, 30897564k used, 785431k free, 23445897k cached
PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
I am extracting only digit values till the word cached
. For this i am buliding different patterns for each digit and then extracting values in a list using finditer
. My code till now:
[x.group()for x in re.finditer(r"(\d{2}:\d{2}:\d{2})|(\d+\.\d+?)%id"), text]
This is a fragment of regex where i have to specify pattern for every digit like suffix and prefix string. Is there a more efficient way to take output?
desired_values=[00:00:00, 200, 23:35, 0, 0.04, 0.05, 0.02 ,
300, 2, 298, 0, 0,
12.0, 2.5, 0.0, 89.2, 0.0, 0.1, 0.0,
123456, 1234567, 989991, 11156793,
9234456, 30897564, 785431, 23445897]
These values then i insert in database, that's why they should be in list.
Upvotes: 1
Views: 110
Reputation: 627087
You may use
r'(?s)(?<!\d)(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+)(?!\d)(?=.*\bcached\b)'
See the regex demo
Details
(?<!\d)
- no digit immediately to the left is allowed(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+)
- either of
\d{2}:\d{2}(?::\d{2})?
- 2 digits, :
, 2 digits and then an optional sequence of :
and 2 digits|
- or\d*\.?\d+
- 0+ digits, an optional .
and then 1+ digits(?!\d)
- no digit immediately to the right is allowed(?=.*\bcached\b)
- there must be a word cached
somewhere to the right of the current location.import re
text = r"""crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02
Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st
Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers
Swap: 456K total, 30897564k used, 785431k free, 23445897k cached
PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND"""
print( re.findall(r'(?<!\d)(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+)(?!\d)(?=.*\bcached\b)', text, re.S) )
Output:
['00:00:00', '200', '23:35', '0', '0.04', '0.05', '0.02', '300', '2', '298', '0', '0', '12.0', '2.5', '0.0', '89.2', '0.0', '0.1', '0.0', '123456', '1234567', '989991', '11156793', '456', '30897564', '785431', '23445897']
Upvotes: 1