Reputation: 141
I have a long list of entries in a file in the following format:
<space><space><number><space>"<word/phrase/sentence>"
e.g.
12345 = "Section 3 is ready for review"
24680 = "Bob to review Chapter 4"
I need to find a way of inserting additional text at the beginning of the word/phrase/sentence, but only if it doesn't start with one of several key words.
Additional text: 'Complete: '
List of key words: key_words_list = ['Section', 'Page', Heading']
e.g.
12345 = "Section 3 is ready for review"
(no changes needed - sentence starts with 'Section' which is in the list)
24680 = "Complete: Bob to review Chapter 4"
('Complete: ' added to start of sentence because first word wasn't in list)
This could be done with a lot of string splitting and if
statements but regex seems like it should be a more concise and much neater solution. I have the following that doesn't take account of the list:
for line in lines:
line = re.sub('(^\s\s[0-9]+\s=\s")', r'\1Complete: ', line)
I also have some code that manages to identify the lines that require changes:
print([w for w in re.findall('^\s\s[0-9]+\s=\s"([\w+=?\s?,?.?]+)"', line) if w not in key_words_list])
Is regex the best option for what I need and if so, what am I missing?
Example inputs:
12345 = "Section 3 is ready for review"
24680 = "Bob to review Chapter 4"
Example outputs:
12345 = "Section 3 is ready for review"
24680 = "Complete: Bob to review Chapter 4"
Upvotes: 2
Views: 54
Reputation: 627507
You can use a regex like
^\s{2}[0-9]+\s=\s"(?!(?:Section|Page|Heading)\b)
See the regex demo. Details:
^
- start of string\s{2}
- two whitespaces[0-9]+
- one or more digits\s=\s
- a =
enclosed with a single whitespace on both ends"
- a "
char(?!(?:Section|Page|Heading)\b)
- a negative lookahead that fails the match if there is Section
, Page
or Heading
whole word immediately to the right of the current location.See the Python demo:
import re
texts = [' 12345 = "Section 3 is ready for review"', ' 24680 = "Bob to review Chapter 4"']
add = 'Complete: '
key_words_list = ['Section', 'Page', 'Heading']
pattern = re.compile(fr'^\s{{2}}[0-9]+\s=\s"(?!(?:{"|".join(key_words_list)})\b)')
for text in texts:
print(pattern.sub(fr'\g<0>{add}', text))
# => 12345 = "Section 3 is ready for review"
# 24680 = "Complete: Bob to review Chapter 4"
Upvotes: 1