Reputation: 13
I recently started working with python and regular expression. As a first project, I want to read out a pdf file, filter specific text data and recombine in a excel sheet. Hence I ran in a regex problem:
pdf file output format:
...
The text of this line is not always here\n
The community is here to help you with specific coding, algorithm, or language problems.\n
Summarize\n
ask\n
The text of this line is not always here\n
...
I want to search for "ask" and find it by "specific coding" and "\nSummarize\n". The text below "ask" can not be used to find it reliably, because it is always different.
I tried to use (?=...) and (?<=...) for this but I couldn't figure out a suitable solution.
Maybe I'm doing something wrong. Has someone an idea?
Upvotes: 1
Views: 54
Reputation: 163447
If you want to find ask
you could use a capturing group instead of lookarounds. You could match specific coding
followed by the rest of the line .*
If there are empty lines and newlines after it, you could use \s*
to match those.
Then match a newline followed by Summarize
.
Match the empty lines and a newline again and and capture ask
in a capturing group
\bspecific coding\b.*\s*\r?\nSummarize\s*\r?\n(ask)\b
Upvotes: 1