alvas
alvas

Reputation: 122022

Checking for gzip or plain text and reading the file without checking extensions - python

I have files and they are exactly the same format when read but the only difference is that I'm not sure whether some of these files are gzip.

An example file is as such:

der ||| the ||| 0.3 ||| ||| 
das ||| the ||| 0.4 ||| |||  
das ||| it ||| 0.1 ||| ||| 
das ||| this ||| 0.1 ||| ||| 
die ||| the ||| 0.3 ||| ||| 

And when i read it i am currently doing this:

try: 
    with gzip.open(phrasetablefile, 'rb') as fin:
        for line in fin:
            # do something
except:
    with open(phrasetablefile, 'rb') as fin:
        for line in fin:
            # do something

Is there other ways to do it without the ugly repeating the code? (note that # do something is pretty long piece of code)

Is there a way to do the following?

try: 
    with gzip.open(phrasetablefile, 'rb') as fin:
except:
    with open(phrasetablefile, 'rb') as fin:
        for line in fin:
            # do something

Upvotes: 0

Views: 543

Answers (2)

lexual
lexual

Reputation: 48682

If you have a gzip suffix you could do something like this?

if phrasetablefile.endswith('.gz'):
    opener = gzip.open
else:
    opener = open

with opener(phrasetablefile, 'rb') as fin:
    for line in fin:
        # do something

Upvotes: 0

hitzg
hitzg

Reputation: 12701

Warning: Untested code

Either do (as @jonrsharpe suggests):

def process(fin):
    for line in fin:
        pass # do something

try:
    with gzip.open(phrasetablefile, 'rb') as fin:
        process(fin)
except:
    with open(phrasetablefile, 'rb') as fin:
        process(fin)

or try something like this:

try: 
    fin = gzip.open(phrasetablefile, 'rb')
except:
    fin = open(phrasetablefile, 'rb')

for line in fin:
    pass # do something
fin.close()

Upvotes: 1

Related Questions