Reputation: 7417
So I have about a 8mb file of each with 6 ints seperated by a space.
my current method for parsing this is:
tuplify6 :: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
toInts :: String -> (Int, Int, Int, Int, Int, Int)
toInts line =
tuplify6 $ map read stringNumbers
where stringNumbers = split " " line
and mapping toInts over
liftM lines . readFile
which will return me a list of tuples. However, When i run this, it takes nearly 25 seconds to load the file and parse it. Any way I can speed this up? The file is just plain text.
Upvotes: 7
Views: 596
Reputation: 183888
You can speed it up by using ByteString
s, e.g.
module Main (main) where
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy.Char8 as C
import Data.Char
main :: IO ()
main = do
args <- getArgs
mapM_ doFile args
doFile :: FilePath -> IO ()
doFile file = do
bs <- C.readFile file
let tups = buildTups 0 [] $ C.dropWhile (not . isDigit) bs
print (length tups)
buildTups :: Int -> [Int] -> C.ByteString -> [(Int,Int,Int,Int,Int,Int)]
buildTups 6 acc bs = tuplify6 acc : buildTups 0 [] bs
buildTups k acc bs
| C.null bs = if k == 0 then [] else error ("Bad file format " ++ show k)
| otherwise = case C.readInt bs of
Just (i,rm) -> buildTups (k+1) (i:acc) $ C.dropWhile (not . isDigit) rm
Nothing -> error ("No Int found: " ++ show (C.take 100 bs))
tuplify6:: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
runs pretty fast:
$ time ./fileParse IntList
200000
real 0m0.119s
user 0m0.115s
sys 0m0.003s
for an 8.1 MiB file.
On the other hand, using String
s and your conversion (with a couple of seq
s to force evaluation) also took only 0.66s, so the bulk of the time seems to be spent not parsing, but working with the result.
Oops, missed a seq
so the read
s were not actually evaluated for the String
version. Fixing that, String
+ read
takes about four seconds, a bit above one with the custom Int
parser from @Rotsor's comment
foldl' (\a c -> 10*a + fromEnum c - fromEnum '0') 0
so parsing apparently did take a significant amount of the time.
Upvotes: 8