Reputation: 437
here is my code:(get file line num and word count)
import System.IO
import Data.Maybe
readL::(Int,Int,Int)->IO()
readL (w,l,-1) = do
putStrLn $ "word:" ++(show w )++"\nline:"++(show l)
readL (w,l,0) = do
s<-hIsEOF stdin
if s
then readL (w,l,-1)
else
do
f<-getLine
readL (w+length f,l+1,0)
main = do
hSetBinaryMode stdin True
readL (0,0,0)
when I process a file with size 100m,it just crashes,with error: Stack space overflow: current size 8388608 bytes
Is there something I wrote wrong?
I also have another version here:
import System.IO
import Data.List
main = do
hSetBinaryMode stdin True
interact $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). foldl' (\(w,l) r-> (w + length r,l+1) ) (0,0) .lines
this have the same problem too... and with lots of memory,so,anybody can slove this?I'm just a new learner in haskell.
Upvotes: 0
Views: 212
Reputation: 183858
The problem is that neither the w
nor the l
parameter to readL
are evaluated before the end of input is reached. So for an input with many lines, you build huge thunks (((0 + length line1) + length line2) ... + length lastline)
, similar for l
, and for more than half a million lines or so, evaluating that thunk will not fit in the available stack. Additionally, the length f
holds on to the line read until it is evaluated, causing unnecessarily large memory use.
You have to keep the accumulating parameters evaluated in the loop, the easiest way is with bang-patterns
readL !(!w,!l,-1) = ...
or a seq
:
readL (w,l,c)
| w `seq` l `seq` (c == -1) = putStrLn $ "word:" ++(show w )++"\nline:"++(show l)
readL (w,l,0) = do
s<-hIsEOF stdin
if s
then readL (w,l,-1)
else
do
f<-getLine
readL (w+length f,l+1,0)
The foldl'
version has the same problem,
foldl' (\(w,l) r-> (w + length r,l+1) ) (0,0)
only evaluates the accumulator pair to weak head normal form, that is to the outermost constructor, here (,)
. It does not force evaluation of the components. To do that, you can
use a strict pair type for the fold
data P = P !Int !Int
foo = ... . foldl' (\(P w l) r -> P (w + length r) (l+1)) (P 0 0) . lines
or use seq
in the folded function
... . foldl' (\(w,l) r -> w `seq` l `seq` (w + length r, l+1)) . lines
Upvotes: 2