Reputation: 83517
I am very new to Haskell and am currently trying to solve a problem that requires some string parsing. My input String contains a comma-delimited list of words in quotes. I want to parse this single string into a list of the words as Strings. Where should I start learning about parsing such a String? Is there a partuclar module and/or functions that will be helpful?
p.s. Please don't post a full solution. I am just asking for a pointer to a starting place so I can learn how to do it.
Upvotes: 3
Views: 23471
Reputation: 83517
I finally decided to roll my own parsing functions since this is such a simple situation. I have learned a lot about Haskell since I first posted this question and want to document my solution here:
split :: Char -> String -> [String]
split _ "" = []
split c s = firstWord : (split c rest)
where firstWord = takeWhile (/=c) s
rest = drop (length firstWord + 1) s
removeChar :: Char -> String -> String
removeChar _ [] = []
removeChar ch (c:cs)
| c == ch = removeChar ch cs
| otherwise = c:(removeChar ch cs)
main = do
handle <- openFile "input/names.txt" ReadMode
contents <- hGetContents handle
let names = sort (map (removeChar '"') (split ',' contents))
print names
hClose handle
Upvotes: 9
Reputation: 35089
The most powerful solution is a parser combinator. Haskell has several of these, but the foremost that come to my mind are:
The big advantage of parser combinators is that it is very easy to define parsers using do
notation (or Applicative
style, if you prefer).
If you just want some quick and simple string manipulation capabilities, then consult the text
library (for high-performance byte-encoded strings), or Data.List
(for ordinary list-encoded strings), which provide the necessary functions to manipulate strings.
Upvotes: 7
Reputation: 6991
Here's a particularly cheeky way to proceed:
parseCommaSepQuotedWords :: String -> [String]
parseCommaSepQuotedWords s = read ("[" ++ s ++ "]")
This might work but it's very fragile and rather silly. Essentially you are using the fact that the Haskell way of writing lists of strings almost coincides with your way, and hence the built-in Read
instance is almost the thing you want. You could use reads
for better error-reporting but in reality you probably want to do something else entirely.
In general, parsec
is really worth taking a look at - it's a joy to use, and one of the things that originally really got me excited about Haskell. But if you want a homegrown solution, I often write simple things using case
statements on the result of span
and break
. Suppose you are looking for the next semicolon in the input. Then break (== ';') inp
will return (before, after)
, where:
before
is the content of inp
up to (and not including) the first semicolon (or all of it if there is none)after
is the rest of the string:
after
is not empty, the first element is a semicolonbefore ++ after == inp
So to parse a list of statements separated by semicolons, I might do this:
parseStmts :: String -> Maybe [Stmt]
parseStmts inp = case break (== ';') inp of
(before, _ : after) -> -- ...
-- ^ before is the first statement
-- ^ ignore the semicolon
-- ^ after is the rest of the string
(_, []) -> -- inp doesn't contain any semicolons
Upvotes: 3
Reputation: 6533
Use parsec for anything that that is 'real work'.
For a introduction read https://therning.org/magnus/archives/tag/parsec
Upvotes: 1
Reputation: 27217
In the interest of having a complete answer for those who happen upon this question, Data.Text has some good functions as well.
Upvotes: 1
Reputation: 16087
Since String
s are simply lists of Char
s in Haskell, Data.List would be a good place to start looking (in the interest of learning Haskell).
For more complex cases (where commas may be nested inside quotes and should be ignored, for example), parsec (as Daniel mentioned) would be a better solution.
Also, if you're looking to parse CSVs you may try Text.CSV, though I've not tried it, so I can't say how helpful it'll be.
Upvotes: 6