Reputation: 3842
I have a python code which reads many files. but some files are extremely large due to which i have errors coming in other codes. i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files. Thanks.
Upvotes: 7
Views: 4529
Reputation: 536389
os.stat(filepath).st_size
Assuming by ‘characters’ you mean bytes. ETA:
i need total character count just like what the command 'wc filename' gives me unix
In which mode? wc
on it own will give you a line, word and byte count (same as stat
), not Unicode characters.
There is a switch -m
which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:
import sys, codecs
def getUnicodeFileLength(filepath, charset= None):
if charset is None:
charset= sys.getfilesystemencoding()
readerclass= codecs.getReader(charset)
reader= readerclass(open(filepath, 'rb'), 'replace')
nchar= 0
while True:
chars= reader.read(1024*32) # arbitrary chunk size
if chars=='':
break
nchar+= len(chars)
reader.close()
return nchar
sys.getfilesystemencoding()
gets the locale encoding, reproducing what wc -m
does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.
I don't think you want to do this.
Upvotes: 8
Reputation: 342373
alternative way
f=open("file")
os.fstat( f.fileno() ).st_size
f.close()
Upvotes: 2
Reputation: 5036
If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.
However, if you want the byte count for a given file, you want os.path.getsize()
, which should only need to do a stat
on the file as long as your OS has stat()
or an equivalent call (all Unixes and Windows do).
Upvotes: 7
Reputation: 6208
Try
import os
os.path.getsize(filePath)
to get the size of your file, in bytes.
Upvotes: 5
Reputation: 123841
os.path.getsize(path)
Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.
Upvotes: 4