Reputation: 8678
I have a file like this :
SOME_INFO_BEGIN
....
....
SOME_INFO_END
ACTUAL_DETAIL_BEGIN
TEST|1|23|abcd|
TEST|2|5|efgs|
TEST|3|124|zyz|
ACTUAL_DETAIL_END
I only to read the lines between ACTUAL_DETAILS_BEGIN and ACTUAL_DETAILS_END and they will always start with TEST, however i also only to read the line which has 5 in the 3rd column.
The below code works for me except it gets all the 3 lines -
with open(dir+filename, 'r') as filehandle:
filecontent = filehandle.readlines()
ifa = [k for k in filecontent if 'TEST' in k]
df = pd.DataFrame([sub.split("|") for sub in ifa])
df.columns= ['Type','Amt','Desc','Value1','Value2']
df1 = df[['Type','Desc']]
print(df1)
df1.to_excel (dir+"test.xlsx", index = False)
Q1. Is there a better way to code this ? For eg. how is the filehandle closed for excel write?
Q2. How do i only pick up the 2nd row ?
Upvotes: 0
Views: 185
Reputation: 13185
Nothing to test with but you can iterate files and lazily load the lines. Perhaps this is more efficient:
rebuilt = []
with open(dir+filename, 'r') as infile:
for row in infile:
if row[:4] == 'TEST':
rebuild.append(row.split('|'))
df = pd.DataFrame(rebuilt, columns= ['Type','Amt','Desc','Value1','Value2'])
readlines()
is going to load the whole thing into memory, regardless, so you can filter the lines as you read them instead. You're also splitting each line before checking a condition, so it may be faster to check against a list slice.
Upvotes: 1