Reputation: 89
I am trying to use Regex to look through a specific part of a string and take what is between but I cant get the right Regex pattern for this.
My biggest issue is with trying to form a Regex pattern for this. I've tried a bunch of variations close to the example listed. It should be close.
import re
toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "
# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text).lower())
# Gets rid of whitespace in case they move the []/[x] around
result = result.replace(" ", "")
if any(x in result for x in toFind):
print("Exists")
else:
print("Doesn't Exist")
Happy Path: I take string (text) and use Regex expression to get the substring between Link Created and Research Done.
Then make the result lowercase and get rid of whitespace just in case they move the []/[x]s. Then it looks at the string (result) for '[]' or '[x]' and print.
Actual Output: At the moment all I keep getting is None because the the Regex syntax is off...
Upvotes: 2
Views: 2986
Reputation: 15120
Seems like regex is overkill for this particular job unless I am missing something (also not clear to me why you need the step that removes the whitespace from the substring). You could just split on "Link Created" and then split the following string on "Research Done".
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "
s = text.split("Link Created")[1].split("Research Done")[0].lower()
if "[]" in s or "[x]" in s:
print("Exists")
else:
print("Doesn't Exist")
# Exists
Upvotes: 1
Reputation: 12201
If you want .
to match newlines, you have the use the re.S
option.
Also, it would seem a better idea to check if the regex matched before proceeding with further calls. Your call to lower()
gave me an error because the regex didn't match, so calling result.group(0).lower()
only when result
evaluates as true is safer.
import re
toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "
# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text, re.S))
if result:
# Gets rid of whitespace in case they move the []/[x] around
result = result.group(0).lower().replace(" ", "")
if any(x in result for x in toFind):
print("Exists")
else:
print("Doesn't Exist")
else:
print("re did not match")
PS: all the re
options are documented in the re module documentation. Search for re.DOTALL
for the details on re.S
(they're synonyms). If you want to combine options, use bitwise OR. E.g., re.S|re.I
will have .
match newline and do case-insensitive matching.
Upvotes: 1
Reputation: 3288
I believe it's the \n
newline characters giving issues. You can get around this using [\s\S]+
as such:
import re
toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "
# New regex to match text between
# Remove all newlines, tabs, whitespace and column separators
result = re.search(r"Link Created([\s\S]+)Research Done", text).group(1)
result = re.sub(r"[\n\t\s\|]*", "", result)
if any(x in result for x in toFind):
print("Exists")
else:
print("Doesn't Exist")
Upvotes: 1