Tarun
Tarun

Reputation: 161

Reading a file until a specific character in python

I am currently working on an application which requires reading all the input from a file until a certain character is encountered.

By using the code:

file=open("Questions.txt",'r')
c=file.readlines()
c=[x.strip() for x in c]

Every time strip encounters \n, it is removed from the input and treated as a string in list c.

This means every line is split into the part of a list c. But I want to make a list up to a point whenever a special character is encountered like this:

if the input file has the contents:

1.Hai
2.Bye\-1
3.Hello
4.OAPd\-1

then I want to get a list as c=['1.Hai\n2.Bye','3.Hello\n4.OApd']

Please help me in doing this.

Upvotes: 13

Views: 54786

Answers (2)

Alfe
Alfe

Reputation: 59426

The easiest way would be to read the file in as a single string and then split it across your separator:

with open('myFileName') as myFile:
  text = myFile.read()
result = text.split(separator)  # use your \-1 (whatever that means) here

In case your file is very large, holding the complete contents in memory as a single string for using .split() is maybe not desirable (and then holding the complete contents in the list after the split is probably also not desirable). Then you could read it in chunks:

def each_chunk(stream, separator):
  buffer = ''
  while True:  # until EOF
    chunk = stream.read(CHUNK_SIZE)  # I propose 4096 or so
    if not chunk:  # EOF?
      yield buffer
      break
    buffer += chunk
    while True:  # until no separator is found
      try:
        part, buffer = buffer.split(separator, 1)
      except ValueError:
        break
      else:
        yield part

with open('myFileName') as myFile:
  for chunk in each_chunk(myFile, separator='\\-1\n'):
    print(chunk)  # not holding in memory, but printing chunk by chunk

Upvotes: 22

Stolson
Stolson

Reputation: 108

I used "*" instead of "-1", I'll let you make the appropriate changes.

s = '1.Hai\n2.Bye*3.Hello\n4.OAPd*'
temp = ''
results = []

for char in s:
    if char is '*':
        results.append(temp)
        temp = []
    else:
        temp += char

if len(temp) > 0:
    results.append(temp)

Upvotes: -4

Related Questions