Reputation: 244

Printing a single line in a multi-line string

I manage to use pytesseract to convert an invoice image into text.

The multi-line string looks like this:

Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00

I would like to extract invoice number, just the number (i.e.: 20191220.001) using substring. I manage to get the start index through index = string.find('Receipt No: ') but when I use the substring function to extract the number print(string[index:]) I got the following result:

20191220.001
Date: 20 December 2019
Invoice amount: $400.00

But I only wanted to extract the first line. The invoice numbers are not defined at only 12 characters, there might be more or less depending on the vendor. How do I only extract the invoice number? I'm doing this to automate an accounting process.

Upvotes: 1

Answers (5)

tandat

Reputation: 255

If you only care about the first line, you can find the first occurence of line ending character as the end of your number. Notice that the start of your number is the end of the substring ("Receipt No: ") while find function return the start of the substring.

string = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''
sub = 'Receipt No: '
start = string.find(sub) + len(sub)
end = string.find('\n')
print(string[start:end])

If you also care about other lines. You can use split and process each line separately.

lines = string.split('\n')
sub = 'Receipt No: '
index = lines[0].find(sub) + len(sub)
print(lines[0][index:])
# Process line 1
# Process line 2

Upvotes: 0

Joe

Reputation: 12417

You can use split:

s = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''

number = s.split('Receipt No: ')[1].split('\n')[0]
print(number)

Output:

20191220.001

Or if you want to use find, you can do in this way:

index1 = s.find(':')
index2 = s.find('\n')
print(s[index1+1:index2].strip())

Upvotes: 1

Monu Nagar

Reputation: 24

You may try with split function.

with open("filename",'r') as dataload:

for i in dataload.readlines():

    if "Receipt No:" in i:

        print(i.split(":")[1].strip())

output-

20191220.001

if "Receipt No:" in i: ---> you can change if "**" parameter as per your requirement

Upvotes: 0

awasi

Reputation: 369

Separate your string in a list with split by "\n" You will get each part of a string separated by newline as a list element. You can then take the part you want

string = """Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""

your_list = string.split("\n")
data = your_list[0]

Upvotes: 0

luigigi

Reputation: 4215

Try:

import re
s = """
Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""
p = re.compile("Receipt No\: (\d+.\d+)")
result = p.search(s)
index = result.group(1) #'20191220.001'

Upvotes: 0

Printing a single line in a multi-line string

Answers (5)

Related Questions