Slimane MEHARZI
Slimane MEHARZI

Reputation: 364

Special re.sub python3

I would like to do some special re.sub input

string = "\"hope\" and \"love\" or \"passion\" and (\"luck\" or \"money\") "
word_list = ['hope', 'love', 'passion', 'money', 'luck']

the hoped output is

'0 and 1 or 2 and (4 or 3)

i try with

print(re.sub("\"([^\"]*)\"", stri.index(r'\g<1>') , string))

but it dosen't work

Upvotes: 0

Views: 514

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92904

Use re.sub function with replacement function as a second argument:

string = "\"hope\" and \"love\" or \"passion\" and (\"luck\" or \"money\") "
word_list = ['hope', 'love', 'passion', 'money', 'luck']

print(re.sub("\"([^\"]*)\"", lambda m:
    str(word_list.index(m.group(1))) if m.group(1) in word_list else m.group(1), string))

The output:

0 and 1 or 2 and (4 or 3) 

(keep in mind that there could be matches which are not in the word_list list, e.g. ... (\"luck\" or \"money\") or \"compassion\")

re.sub(pattern, repl, string, count=0, flags=0)

... If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107347

Without considering your word list, you can use itertools.count in order to count the number of matches and a function as the second argument of the sub() function that calls the next of the counter for each match.

In [10]: from itertools import count

In [11]: c = count()

In [12]: re.sub(r'"([^"]+)"', lambda x: str(next(c)), string)
Out[12]: '0 and 1 or 2 and (3 or 4) '

If you want the indices to be based on the word's indices in word_list as an efficient approach you can create a dictionary from words as the key and indices as the values then use a simple indexing to get the corresponding index within sub() function:

In [29]: word_dict = {w: str(i) for i, w in enumerate(word_list)}

In [30]: re.sub(r'"([^"]+)"', lambda x: word_dict[x.group(1)], string)
Out[30]: '0 and 1 or 2 and (4 or 3) '

Note that you could use list.index method in order to access to word's index for each word. But due to the fact that the complexity of list indexing is O(n) it's not as efficient as using a dictionary indexing which is O(1).

Upvotes: 0

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48120

Alternatively (without re), you may iterate over the word_list using enumerate and replace content of the string using str.replace() as:

my_string = "\"hope\" and \"love\" or \"passion\" and (\"luck\" or \"money\") "
word_list = ['hope', 'love', 'passion', 'money', 'luck']

for i, word in enumerate(word_list):
    my_string = my_string.replace('"{}"'.format(word), str(i))

The final value hold by my_string will be:

'0 and 1 or 2 and (4 or 3) '

Upvotes: 0

Related Questions