tandem
tandem

Reputation: 2238

find the entire string based on only the substring from a string

This might be a repetitive question, after searching for a bit, I couldn't find an answer and thus I am posting the question. How do I find the entire string based on only the substring from a string?

import re

test = 'INFO: 106.00s - SearchDriver: GET CThru=27.027 OThru=25.566 CErr=0.000 CResp=0.013 OResp=0.011 CSD=0.015 OSD=0.010 C90%Resp=0.025 O90%Resp=0.025'

To get the value of CThru=27.027, I'm trying to do this.

re.findall("CThru=*", test)

but it returns only

['CThru=']

Upvotes: 0

Views: 67

Answers (3)

Martijn Pieters
Martijn Pieters

Reputation: 1124998

The * quantifier is always applied to the thing it is placed after; <regex thing>* means that <regex thing> should be matched zero or more times.

For your attempt, <regex thing> is the = character, so =* means: zero or more equals characters. And indeed, 'CThru=' contains one such equal character, and no more. The * won't match anything else! This differs from glob syntax, commonly used when listing files, where just the * character, on its own, is used to match zero or more filename characters. Regular expressions are not glob patterns.

If you wanted to get the value following the = character, you need to put in a pattern (a regex thing) to match characters in the value text. Since values are always characters that are not spaces (a space separates the key=value pairs), you could use the [^ ] set to say not a space and add + to that to make sure there is at least one character. [^...] is a negative set, a regex 'thing' that'll match any character in the text that is not in the set, so [^ ] matches any character that is not a space. The + quantifier means one or more characters, so we want 1 or more characters that are not spaces. * and + are greedy, meaning that the regex matcher will use as many characters as it can take to satisfy that pattern.

If you put (...) parentheses around that part, you tell the regex engine to capture that part and put it in a group, and re.findall() will return everything in group 1 if there is just that group. So just the values after CThru= is returned:

re.findall("CThru=([^ ]+)", test)

This will return any kind of text that is not spaces, as a list:

>>> import re
>>> test = 'INFO: 106.00s - SearchDriver: GET CThru=27.027 OThru=25.566 CErr=0.000 CResp=0.013 OResp=0.011 CSD=0.015 OSD=0.010 C90%Resp=0.025 O90%Resp=0.025'
>>> re.findall("CThru=([^ ]+)", test)
['27.027']

If there is only ever going to be one such key-value pair, you may as well use re.search(), and ask for group 1 if that gives you a result that is not None:

match = re.search("CThru=([^ ]+)", test)
if match:
    value = match.group(1)

Upvotes: 2

Nordle
Nordle

Reputation: 2981

The asterisk on the end is making the string CThru= greedy (0 or more matches), but it doesn't search for anything after it.

Something along the lines of re.findall("CThru=\d*\.\d*", test) should work, as long as CThru= is always followed by a float and then a space.

Upvotes: 0

Corentin Limier
Corentin Limier

Reputation: 5016

re.findall("CThru=[^\s]*", test)

works well.

You need something before the *.

re.findall("CThru=.*", test)

will catch from CThru to then end of the string, for example.

Upvotes: 0

Related Questions