Sumanth
Sumanth

Reputation: 383

Facing errors when reading huge text in python

Using Python3 my requirement is to read email files from a directory and filter Html tags in it.

I have managed to do it to a large extent.When I try to read the content of my output, it gives an error

for line in output.splitlines():
AttributeError: 'int' object has no attribute 'splitlines'  

for file in glob.glob('spam/*.*'):
    output = os.system("python html2txt.py " + file)
    for line in output.splitlines():
     print(line)  

When I print output, it shows a filtered text.Any help is appreciated.

Upvotes: 0

Views: 79

Answers (3)

lee-pai-long
lee-pai-long

Reputation: 2282

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command. The shell is given by the Windows environment variable COMSPEC: it is usually cmd.exe, which returns the exit status of the command run; on systems using a non-native shell, consult your shell documentation. python docs

So your output variable is a integer not the result of the file being parsed by the html2txt.py script.

And why do you run another python script outside of your current process ? Can't you just import whatever class of function that is doing the job from that module ?

Also there is an email module that can help you

Upvotes: 0

OLIVER.KOO
OLIVER.KOO

Reputation: 5993

The return value of os.system(command) is system-dependent, it supposes to return the (encoded) process exit value which represented by an int. read more here

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

But in no system it returns a str and the method splitlines() is a str method. read more here

You are calling a str method on a int that is why you get the error:

AttributeError: 'int' object has no attribute 'splitlines'

Upvotes: 0

flevinkelming
flevinkelming

Reputation: 690

Try this as a replacement for the code you've provided:

import glob

files = glob.glob('spam/*.*')

for f in files:
    with open(f) as spam_file:
        for line in spam_file:
            print(line)

If the files are indeed html files, I would recommend looking into BeautifulSoup.

Upvotes: 1

Related Questions