Reputation: 685
My current implementation with Text.ParserCombinators.ReadP
:
trim :: ReadP String
trim = do skipSpaces
content <- some get
skipSpaces
eof
return content
The problem is, ambiguous results frequently pop up when run:
ghci> readP_to_S trim " hello world "
[("hello world",""),("hello world ","")]
I believe I need to specify that the trimmed content will not end with space characters, but I have no idea how to express that apart from this extremely inefficient design:
trim :: ReadP String
trim = do skipSpaces
content <- many get
lastOne <- satisfy (not . isSpace)
skipSpaces
eof
return (content ++ [lastOne])
We can have better things, right?
Upvotes: 1
Views: 126
Reputation: 153172
If the inefficiency you are worried about stems from many get
frequently backtracking, you could require each sequence of whitespace to end with a non-whitespace. You can also use the munch
family to avoid the backtracking introduced by many (satisfy ...)
.
trim :: ReadP String
trim = do skipSpaces
content <- some $ liftA2 (++)
(munch isSpace)
(munch1 (not . isSpace))
skipSpaces
eof
return (concat content)
Upvotes: 0
Reputation: 51119
The following parser ought to work:
trim :: ReadP String
trim = skipSpaces *> content
where
content = do
spc <- spaces
((\w rest -> spc ++ w ++ rest) <$> word <*> content)
<++ pure ""
word = munch1 (not .isSpace)
spaces = munch isSpace
The idea is that the content
parser reads (possibly empty) leading whitespace and then attempts to read a non-empty, non-whitespace word
. If such a word is found, the leading whitespace and word are prefixed to additional content
. If no word is found, the "leading whitespace" must be trailing whitespace at the end of the input string and is discarded (i.e., pure ""
).
By itself, content
will right-trim spaces off the input. By invoking it after skipSpaces
in the definition of trim
, we ensure that the result will be left-trimmed as well.
This parser should always return a single alternative [("trimmed string","")]
, as per the following quick check:
import Test.QuickCheck
import Data.List
import Data.Char
import Text.ParserCombinators.ReadP
trim :: ReadP String
trim = skipSpaces *> content
where
content = do
spc <- spaces
((\w rest -> spc ++ w ++ rest) <$> word <*> content)
<++ pure ""
word = munch1 (not .isSpace)
spaces = munch isSpace
test_trim :: String -> Bool
test_trim str = readP_to_S trim str == [(dropWhile isSpace (dropWhileEnd isSpace str), "")]
main = quickCheck test_trim
Upvotes: 0
Reputation: 2818
You could parse a word at a time, then put them back together.
(This does compress any multiple spaces into single ones, if that matters.)
trim :: ReadP String
trim = skipSpaces *> content <* eof
where
content = unwords <$> many token
token = munch1 (not . isSpace) <* skipSpaces
But a more direct way is to say that the parse is terminated by a
component which is zero or more spaces then eof. Then we can use manyTill
, which will attempt to match internal spaces as the terminator, but will fail because those spaces aren't followed by eof.
trim :: ReadP String
trim = skipSpaces *> manyTill get (skipSpaces <* eof)
(This is potentially slow if you have long runs of internal spaces, since after each one the parser will go through the rest to see if an eof is at the end.)
Upvotes: 0
Reputation: 477533
Instead of trying to do this through parser combinatorics, why not just parse the content and then trim, like:
import Data.Char(isSpace)
import Data.List(dropWhileEnd)
trim :: ReadP String
trim = do skipSpaces
content <- many get
skipSpaces
eof
return (trimData content)
trimStr :: String -> String
trimStr = dropWhileEnd isSpace . dropWhile isSpace
The parser can be shortened then to:
trim :: ReadP String
trim = trimStr <$> (many get <* eof)
We thus first retrieve all characters, and then we return the trimmed string, although the eof
might not be necessary here.
Upvotes: 0