Reputation: 2684
I need to replace all consecutive tabs in a bytestring with single tabs, like so:
"___\t___\t\t___\t\t\t___"
becomes
"___\t___\t___\t___"
I have no idea how to do it.
After half an hour of figuring stuff out I managed to replace the first occurrence of double tabs, like so (and even this is actually wrong – it adds a tab to an empty string):
import qualified Pipes.ByteString as PB
import qualified Data.ByteString as B
removeConsecutiveTabs =
PB.break (== tab) . mapped %~ \p -> do
yield (B.singleton tab)
PB.dropWhile (== tab) p
However, I still don't know how to replace all occurrences of consecutive tabs.
Upvotes: 2
Views: 191
Reputation: 34378
The key is operating on a stream of bytes rather than one of bytestring chunks. That can be done through pack
from pipes-bytestring (and pipes-text). Here is a not particularly sophisticated demo:
{-# LANGUAGE OverloadedStrings #-}
import Pipes
import qualified Pipes.ByteString as PB
import qualified Data.ByteString as B
import Control.Monad
import Control.Lens (over)
test byst = runEffect $
removingConsecutiveTabs (PB.fromLazy byst) >-> PB.stdout
removingConsecutiveTabs :: Monad m
=> Producer B.ByteString m r
-> Producer B.ByteString m r
removingConsecutiveTabs = over PB.pack tabGatekeeper
tabGatekeeper :: Monad m => Producer PB.Word8 m r -> Producer PB.Word8 m r
tabGatekeeper = go False
where
go wasTab stream = do
ex <- lift $ next stream
case ex of
Left r -> return r
Right (x, stream') -> do
let thisIsATab = w8IsTab x
unless (wasTab && thisIsATab) $ yield x
go thisIsATab stream'
w8IsTab x = toEnum (fromIntegral x) == '\t'
GHCi> :set -XOverloadedStrings
GHCi> test "___\t___\t\t___\t\t\t___\n"
___ ___ ___ ___
Upvotes: 1
Reputation: 52039
Try this:
{-# LANGUAGE OverloadedStrings #-}
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as PB
import Data.ByteString (ByteString)
import Control.Lens hiding (each)
cleanTabs p = do
p1 <- view (PB.span (/= 9)) p
x <- lift $ next p1
case x of
Left r -> return r
Right (a, p2) -> do
yield "\t"
let p3 = PB.dropWhile (== 9) (yield a >> p2)
cleanTabs p3
source :: Monad m => Producer ByteString m ()
source = each [ "this", "is\t an", "\t\texample\t", "\t.", "\t\tmiddle\t", "\there"]
example = do
putStrLn $ "input: " ++ (show $ P.toList source)
putStrLn $ "output:" ++ (show $ P.toList (cleanTabs source))
Upvotes: 2
Reputation: 52039
Here is a solution which doesn't use PB.break
but just uses basic pipe operations. One of the problem is that the data comes in chunks, and you have to keep track of whether or not the last chunk ended in a tab or not:
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE OverloadedStrings #-}
module Lib5
where
import Pipes
import Pipes.ByteString
import qualified Data.ByteString.Char8 as BS
import Control.Monad
printAll = forever $ do a <- await; lift $ putStrLn $ "got: " ++ show a
endsWith bs ch = BS.length bs > 0 && BS.last bs == ch
convertDoubleTabs = await >>= go0
-- no precediing tab
go0 b = do
let (pre,post) = BS.breakSubstring "\t\t" b
yield pre
if BS.length post == 0
then if endsWith pre '\t'
then await >>= go1
else await >>= go0
else do yield "\t"
go0 (BS.drop 2 post)
-- last chunk ended in a tab
go1 b = do
if BS.length b == 0
then await >>= go1
else if BS.index b 0 == '\t'
then go0 (BS.drop 1 b)
else go0 b
example1 = runEffect $ each [ "this", "is\t an", "\t\texample\t", "\t."]
>-> convertDoubleTabs
>-> printAll
I'll add to this answer if I figure out a solution using the Pipes.ByteString
and lenses.
To convert all consecutive tabs:
convertTabs = await >>= go0
where
go0 b = do
let (pre,post) = BS.break (== '\t') b
yield pre
if BS.length post == 0
then await >>= go0
else do yield "\t"
go1 post
go1 b = do
let b' = BS.dropWhile (== '\t') b
if BS.null b'
then await >>= go1
else go0 b'
example2 = runEffect $ each [ "___\t___\t\t___\t\t\t___" ]
>-> convertTabs
>-> printAll
Upvotes: 0