Reputation: 288
I have a very long string that I extracted from a image file. The string can look like this
...\n\nDate: 01.01.2022\n\nArticle-no: 123456789\n\nArticle description: asdfqwer 1234...\n...
How do I extract just the 10 characters after the substring "Article-no:"
?
I tried solving it with a different approach using rfind like this but it tends to fail every now and then if the start and end string is not accurate.
s = "... string shown above ..."
start = "Article-no: "
end = "Article description: "
print(s[s.find(start)+len(start):s.rfind(end)])
Upvotes: 1
Views: 1086
Reputation: 1200
For this, a regular expression might come in very handy.
import re
# Create a pattern which matches "Article-no: " literally,
# and then grabs the digits that follow.
pattern = re.compile(r"Article-no: (\d+)")
s = "...\n\nDate: 01.01.2022\n\nArticle-no: 123456789\n\nArticle description: asdfqwer 1234...\n..."
match = pattern.search(s)
if match:
print(match.group(1))
This outputs:
123456789
The regular expression used is Article-no: (\d+)
, which has the following parts:
Article-no: # Match this text literally
( # Open a new group (i.e. group 1)
\d+ # Match 1 or more occurrences of a digit
) # Close group 1
The re
module will search the string for places where this matches, and then you can extract the digit from the matches.
Upvotes: 0