August Jelemson
August Jelemson

Reputation: 1218

How can I check if a text file is empty in Haskell?

How can I check if a text file is empty in Haskell?

I have already tried:

main = do
       contents <- getContents
       if null contents then do
       putStrLn "File was empty"
       return()
       else do
       putStrLn "File was not empty"
       return

Upvotes: 0

Views: 1709

Answers (2)

K. A. Buhr
K. A. Buhr

Reputation: 51029

@hnefatl's answer is fine, and is probably what you're looking for to get your original code working. However, in case someone else stumbles across this question while trying to figure out how to check if a file is empty, a more advanced (but preferred) answer follows.

In "real" Haskell code, assuming you're trying to check the size of a plain file, you'd probably use getFileStatus and the helper function fileSize from System.Posix, as discussed in this question: What is the best way to retrieve the size of a file in haskell?

Specifically, a function like the following would work on Linux:

import System.Posix

isEmpty1 :: FilePath -> IO Bool
isEmpty1 fp = do
  stat <- getFileStatus fp
  return (fileSize stat == 0)

If you needed Windows compatibility, you could either install the unix-compat package which provides a cross-platform version of getFileStatus, or you could use hFileSize from System.IO:

import System.IO

isEmpty2 :: FilePath -> IO Bool
isEmpty2 fp = do
  sz <- withFile fp ReadMode hFileSize
  return (sz == 0)

The main advantage of isEmpty1 above is that getFileStatus should be very efficient as it uses only one system call (a stat call). It would be the preferred approach if you had to check the size of lots of files. The isEmpty2 solution is okay, too, but it involves (at least) three system calls (open, then fstat, then close) and needs to open a file handle temporarily, which could be an issue if you were checking a lot of files in parallel or something.

Both will perform better than the readFile method as they don't cause a read of any file data. In contrast, readFile needs to read at least one data block off the disk in order to determine that the resulting string is empty.

Thanks to Haskell's lazy I/O, it'll just do an initial read and won't have to read the whole file, but this leads to another odd quirk. The way readFile is designed to open a file and lazily reads its contents, it turns out that the file is held open until the contents are completed read, and there's no other way to force the file to close without reading the complete contents! Therefore, unless the file is short enough that it gets completely read by that initial read, the file handle will be held open indefinitely. So, if you had a program that checked whether a whole bunch of text files were empty, if some of them larger than a single block, you'd probably end up running out of file handles.

In general, readFile shouldn't be used unless either (a) the whole file contents will be processed; or (b) the program will terminate soon after reading whatever partial contents it needs.

Upvotes: 0

hnefatl
hnefatl

Reputation: 6037

The issue is actually mostly in your formatting, not in your logic!

Whitespace matters in Haskell, see this wiki page. Following its rules, and fixing the minor typo in the last return statement (you need to return unit, (), like you did earlier):

main = do
   contents <- getContents
   if null contents then do
       putStrLn "File was empty"
       return ()
   else do
       putStrLn "File was not empty"
       return ()

And this should work perfectly. However, it's not reaally best practices - where you have the do block inside each part of the if expression you're constructing a new monadic action, which isn't necessary. Given that putStrLn returns an IO () anyway, you can just do:

main = do
   contents <- getContents
   if null contents then
       putStrLn "File was empty"
   else
       putStrLn "File was not empty"

Bear in mind that the if ... then x else y expression returns a value of the same type as x and y are - here, x and y both have type IO (), so the expression returns an IO (), which is why it can be used inside this do block.


Note that this doesn't check if a file's empty, it checks if the standard input is empty: this is okay if you're piping input from a file into this program's stdin, but if you want to actually read from a file you should look into readFile - and you'll barely need to change this program! Something like the following will do the trick:

main = do
   putStrLn "Enter the file path:"
   path <- getLine
   contents <- readFile path
   if null contents ...

Upvotes: 6

Related Questions