randeepsp
randeepsp

Reputation: 3842

how to check the character count of a file in python

I have a python code which reads many files. but some files are extremely large due to which i have errors coming in other codes. i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files. Thanks.

Upvotes: 7

Views: 4529

Answers (5)

bobince
bobince

Reputation: 536389

os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command 'wc filename' gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.

I don't think you want to do this.

Upvotes: 8

ghostdog74
ghostdog74

Reputation: 342373

alternative way

f=open("file")
os.fstat( f.fileno() ).st_size
f.close()

Upvotes: 2

Mike
Mike

Reputation: 5036

If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.

However, if you want the byte count for a given file, you want os.path.getsize(), which should only need to do a stat on the file as long as your OS has stat() or an equivalent call (all Unixes and Windows do).

Upvotes: 7

Sapph
Sapph

Reputation: 6208

Try

import os
os.path.getsize(filePath)

to get the size of your file, in bytes.

Upvotes: 5

YOU
YOU

Reputation: 123841

os.path.getsize(path) 

Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.

Upvotes: 4

Related Questions