btava001
btava001

Reputation: 131

Loop Through Text File Until I Reach a Certain Block

I am using FPDF to convert text to PDF. When I write into the PDF the headers look way off from the original text. I came up with solution to go line by line and position them. I have column headers that starts at "Account#" and Ends at the "-------". How can I make changes to all the headers while keeping the data under it the same?

Original Text: https://flic.kr/p/2hw2Zft

PDF : https://flic.kr/p/2hw43hQ

pdf = FPDF("L", "mm", "A4")
pdf.add_page()
pdf.set_font('arial', style='', size=10.0)

with open('C:\\Users\\bxt058y\\PycharmProjects\\MSIT501\\SUMB_Statement_29396- 
76397.txt', 'r') as file:

lines = file.readlines()
for line in lines:
    pdf.multi_cell(h=5.0, align='L', w=0, txt=line, border=0)
pdf.output('drafttest.pdf', 'F')

header1 = lines[0]
header2 = lines[1]
header3 = lines[2]
header4 = lines[3]
header5_1 = " ".join(lines[4].split()[:4])
print(header5_1)
header5_2 = " ".join(lines[4].split()[4:])
print(header5_2)
header6 = lines[5]
header7 = lines[6]
print(header_find)
header8 = lines[7]
header8_1 = " ".join(lines[8].split()[:4])
header8_2 = " ".join(lines[8].split()[4:])
print(header8_2)
header9_1 = " ".join(lines[9].split()[:5])
header9_2 = " ".join(lines[9].split()[5:])



pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header1.strip(), border=0)
pdf.set_x(124)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header2.strip(), border=0)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header3.strip(), border=0)
pdf.set_x(65)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header4, border=0)
pdf.set_x(45)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt=header5_1, border=0)
pdf.set_x(129)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header5_2, border=0)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header6.strip(), border=0)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header7.strip(), border=0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt=header8_1, border=0)
pdf.set_x(125)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header8_2, border=0)
pdf.cell(ln=0, h=5.0, align='L', w=0, txt=header9_1, border=0)
pdf.set_x(125)
pdf.cell(ln=1, h=5.0, align='L', w=0, txt=header9_2, border=0)

Upvotes: 1

Views: 98

Answers (2)

Jan
Jan

Reputation: 43169

Look into regular expressions (and mind the different modifiers, namely singleline, multiline and verbose):

^
Account\#
.+?
(?=^---)

The expression must be done on the whole string / file content. See a demo on regex101.com.

Upvotes: 1

Quastiat
Quastiat

Reputation: 1242

Maybe:

import pandas as pd
data = pd.read_csv('text.txt', header = None)
header = ['Account#', '-----']
header_only = data[data.iloc[:,0].isin(header)]

where header contains the first elemts of the header rows you are looking for

Upvotes: 1

Related Questions