Code-Apprentice
Code-Apprentice

Reputation: 83517

String parsing in Haskell

I am very new to Haskell and am currently trying to solve a problem that requires some string parsing. My input String contains a comma-delimited list of words in quotes. I want to parse this single string into a list of the words as Strings. Where should I start learning about parsing such a String? Is there a partuclar module and/or functions that will be helpful?

p.s. Please don't post a full solution. I am just asking for a pointer to a starting place so I can learn how to do it.

Upvotes: 3

Views: 23471

Answers (6)

Code-Apprentice
Code-Apprentice

Reputation: 83517

I finally decided to roll my own parsing functions since this is such a simple situation. I have learned a lot about Haskell since I first posted this question and want to document my solution here:

split :: Char -> String -> [String]
split _ "" = []
split c s = firstWord : (split c rest)
    where firstWord = takeWhile (/=c) s
          rest = drop (length firstWord + 1) s

removeChar :: Char -> String -> String
removeChar _ [] = []
removeChar ch (c:cs)
    | c == ch   = removeChar ch cs
    | otherwise = c:(removeChar ch cs)

main = do
    handle <- openFile "input/names.txt" ReadMode
    contents <- hGetContents handle
    let names = sort (map (removeChar '"') (split ',' contents))
    print names
    hClose handle

Upvotes: 9

Gabriella Gonzalez
Gabriella Gonzalez

Reputation: 35089

The most powerful solution is a parser combinator. Haskell has several of these, but the foremost that come to my mind are:

  • parsec: a very good general-purpose parsing library
  • attoparsec: a faster version of parsec, which sacrifices the quality of error messages and some other features for extra speed
  • uu-parsinglib: a very powerful parsing library

The big advantage of parser combinators is that it is very easy to define parsers using do notation (or Applicative style, if you prefer).

If you just want some quick and simple string manipulation capabilities, then consult the text library (for high-performance byte-encoded strings), or Data.List (for ordinary list-encoded strings), which provide the necessary functions to manipulate strings.

Upvotes: 7

Ben Millwood
Ben Millwood

Reputation: 6991

Here's a particularly cheeky way to proceed:

parseCommaSepQuotedWords :: String -> [String]
parseCommaSepQuotedWords s = read ("[" ++ s ++ "]")

This might work but it's very fragile and rather silly. Essentially you are using the fact that the Haskell way of writing lists of strings almost coincides with your way, and hence the built-in Read instance is almost the thing you want. You could use reads for better error-reporting but in reality you probably want to do something else entirely.

In general, parsec is really worth taking a look at - it's a joy to use, and one of the things that originally really got me excited about Haskell. But if you want a homegrown solution, I often write simple things using case statements on the result of span and break. Suppose you are looking for the next semicolon in the input. Then break (== ';') inp will return (before, after), where:

  • before is the content of inp up to (and not including) the first semicolon (or all of it if there is none)
  • after is the rest of the string:
    • if after is not empty, the first element is a semicolon
    • regardless of what else happens, before ++ after == inp

So to parse a list of statements separated by semicolons, I might do this:

parseStmts :: String -> Maybe [Stmt]
parseStmts inp = case break (== ';') inp of
  (before, _ : after) -> -- ...
    -- ^ before is the first statement
    --     ^ ignore the semicolon
    --           ^ after is the rest of the string
  (_, []) -> -- inp doesn't contain any semicolons

Upvotes: 3

Jonke
Jonke

Reputation: 6533

Use parsec for anything that that is 'real work'.

For a introduction read https://therning.org/magnus/archives/tag/parsec

Upvotes: 1

John F. Miller
John F. Miller

Reputation: 27217

In the interest of having a complete answer for those who happen upon this question, Data.Text has some good functions as well.

Upvotes: 1

Adam Wagner
Adam Wagner

Reputation: 16087

Since Strings are simply lists of Chars in Haskell, Data.List would be a good place to start looking (in the interest of learning Haskell).

For more complex cases (where commas may be nested inside quotes and should be ignored, for example), parsec (as Daniel mentioned) would be a better solution.

Also, if you're looking to parse CSVs you may try Text.CSV, though I've not tried it, so I can't say how helpful it'll be.

Upvotes: 6

Related Questions