Jeff Maner
Jeff Maner

Reputation: 1179

Haskell Parsec Unexpected End of Input

Here's an example of the file I'm trying to parse:

XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      7
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 123456789 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 1A23B45C79

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12345  1234567890001        AB  ABCDE FGHI PRODUCTS  20140314 20140914 059


XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      8
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 234567890 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 5F7A657G87

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12346  2345678901           AB  ABCDE FGHI PRODUCTS  20140129 20140729 059
12346  3456789012           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059


XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      9
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 345678901 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 6K75L8791L

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12346  4567890123           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059
12346  4567890123           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059
 NUMBER OF SETS REJECTED ARE :         13  TOTAL SETS IN BATCH:     16,940

                           *** END OF REPORT ***

And here is a collection of snippets from my module:

module XX00135 (parseFile) where

import Control.Applicative ((<$>), (<*>), (<*))
import Text.ParserCombinators.Parsec hiding (Line)

data Line = Line { code    :: String
                 , account :: String
                 , aType   :: String
                 , company :: String
                 , begDate :: String
                 , endDate :: String
                 , errCode :: String }

data Page = Page { periodBeginning :: String
                 , periodEnd       :: String
                 , reportDate      :: String
                 , batch           :: String
                 , contrib         :: String
                 , quoteBack       :: String
                 , lineList        :: [Line] }

data Report = Report { pages :: [Page] }


parseReportDate :: Parser String
parseReportDate =
  manyTill anyChar (string "CURRENT COMPANY") >> spaces >> count 10 anyChar

headers :: Parser String
headers =
  choice [ try (string "\n")
         , try (string "CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE     END DATE ERR")
         , try (string "------ -------------------- --- -------------------- -------- -------- ---") ]

line :: Parser Line
line =
  Line <$> count  6 anyChar <* space
       <*> count 20 anyChar <* space
       <*> count  3 anyChar <* space
       <*> count 20 anyChar <* space
       <*> count  8 anyChar <* space
       <*> count  8 anyChar <* space
       <*> count  3 anyChar <* newline

page :: Parser Page
page =
  Page <$> (manyTill anyChar (string "BEG PER:")    >> space >> count 10 anyChar)
       <*> parseReportDate
       <*> (manyTill anyChar (string "END PER:")    >> space >> count 10 anyChar)
       <*> (manyTill anyChar (string "BATCH:")      >> space >> count  9 anyChar)
       <*> (space >> string "CONTRIB:"              >> space >> count  9 anyChar)
       <*> (manyTill anyChar (string "QUOTE BACK:") >> space >> count 10 anyChar
       <*   skipMany1 headers)
       <*> (manyTill line (twoNewLines <|> footer))

report :: Parser Report
report = Report <$> manyTill page (try footer)

twoNewLines :: Parser ()
twoNewLines = (count 2 newline) >> return ()

footer :: Parser ()
footer = (space >> string "NUMBER OF SETS REJECTED" >> manyTill anyChar (string "*** END OF REPORT ***") >> optional eof) >> return ()

parseFile :: [(String, String)] -> String -> String
parseFile errors text =
  let rs = case parse (manyTill report eof) "" text of
      ...

There are 115 lines in the full file. When I cat the file and pipe it to my haskell, I get:

(line 116, column 1);
unexpected end of input
expecting "BEG PER:"

I had it working by just ignoring the footer and anything that followed. But my full use case is to cat multiple files and pipe that to my haskell, meaning that I cannot just throw away the footer and everything that follows it. Once I started trying to ignore the footer instead of just throwing it away, my problems began. It's probably something simple, and I'm just confused and over-looking something obvious.

Let me know if you need more code. I do a few transformations after parsing, and I didn't want to clutter the code with unnecessary detail.

Thanks!

Upvotes: 0

Views: 811

Answers (2)

Jeff Maner
Jeff Maner

Reputation: 1179

I've resolved the problem. The code is a little different, and I'm not sure what exactly solved the problem. I spent a lot of time staring at the code and making little changes here and there. I think, though, that it had to do with cat appending a newline to the file. So I changed footer:

footer = space >> string "NUMBER OF SETS REJECTED"
       >> anyChar `manyTill` (string "*** END OF REPORT ***") >> newline >> string ""

Now footer consumes an extra newline at the end of the file, and returns a string. I use footer in eop (end of page):

eop =
  choice [ count 2 newline
         , footer ]

And I use eop in the last line of page:

<*> line `manyTill` eop

report is now:

report = count 2 newline >> Report <$> many page

I also changed page. I think it was consuming anyChar in unexpected ways. So now I throw away the first line of each page:

page = firstLine >>
  Page <$> (string "BEG PER:" >> space >> count 10 anyChar)
       ...

firstLine =
  string "XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :"
  >> spaces > many digit >> newline

I think that covers all the important changes I made that eventually made the parse successful. It now parses a single file from the cat command, as well as multiple files concatenated by the cat command. Yay! I love Haskell.

Upvotes: 1

Chris Kuklewicz
Chris Kuklewicz

Reputation: 8153

It looks like page consumes footer:

  <*> (manyTill line (twoNewLines <|> footer))

And thus report does not get to consume footer:

report = Report <$> manyTill page (try footer)

Perhaps you need 'sepBy' to recognize 'twoNewLines' between your 'page' (without that last manyTill).

Upvotes: 0

Related Questions