Jon Cox
Jon Cox

Reputation: 10922

Haskell: Delimit a string by chosen sub-strings and whitespace

Am still new to Haskell, so apologize if there is an obvious answer to this...


I would like to make a function that splits up the all following lists of strings i.e. [String]:

["int x = 1", "y := x + 123"]
["int   x=   1", "y:=   x+123"] 
["int x=1", "y:=x+123"] 


All into the same string of strings i.e. [[String]]:

[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]



You can use map words.lines for the first [String].

But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.



Thank you for taking the time to enlighten me on Haskell :-)

Upvotes: 2

Views: 267

Answers (3)

luqui
luqui

Reputation: 60463

The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.

lex :: String -> [(String,String)]

What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:

lexStr :: String -> [String]
lexStr "" = []
lexStr s = 
    case lex s of
        [(tok,rest)] -> tok : lexStr rest
        []           -> error "Failed lex"

To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.

Upvotes: 7

Fiona Runge
Fiona Runge

Reputation: 2311

how about using words .) words :: String -> [String] and words wont care for whitespaces..

words "Hello World"
= words "Hello     World"
= ["Hello", "World"]

Upvotes: 0

OJ.
OJ.

Reputation: 29401

I would take a look at parsec and build a simple grammar for parsing your strings.

Upvotes: 3

Related Questions