Emily
Emily

Reputation: 2684

How to replace double tabs with single tabs using pipes?

I need to replace all consecutive tabs in a bytestring with single tabs, like so:

"___\t___\t\t___\t\t\t___"

becomes

"___\t___\t___\t___"

I have no idea how to do it.

After half an hour of figuring stuff out I managed to replace the first occurrence of double tabs, like so (and even this is actually wrong – it adds a tab to an empty string):

import qualified Pipes.ByteString as PB
import qualified Data.ByteString as B

removeConsecutiveTabs =
  PB.break (== tab) . mapped %~ \p -> do
    yield (B.singleton tab)
    PB.dropWhile (== tab) p

However, I still don't know how to replace all occurrences of consecutive tabs.

Upvotes: 2

Views: 191

Answers (3)

duplode
duplode

Reputation: 34378

The key is operating on a stream of bytes rather than one of bytestring chunks. That can be done through pack from pipes-bytestring (and pipes-text). Here is a not particularly sophisticated demo:

{-# LANGUAGE OverloadedStrings #-}

import Pipes
import qualified Pipes.ByteString as PB
import qualified Data.ByteString as B
import Control.Monad
import Control.Lens (over)

test byst = runEffect $
    removingConsecutiveTabs (PB.fromLazy byst) >-> PB.stdout

removingConsecutiveTabs :: Monad m
                        => Producer B.ByteString m r
                        -> Producer B.ByteString m r
removingConsecutiveTabs = over PB.pack tabGatekeeper

tabGatekeeper :: Monad m => Producer PB.Word8 m r -> Producer PB.Word8 m r
tabGatekeeper = go False
    where
    go wasTab stream = do
        ex <- lift $ next stream
        case ex of
            Left r -> return r
            Right (x, stream') -> do
                let thisIsATab = w8IsTab x
                unless (wasTab && thisIsATab) $ yield x
                go thisIsATab stream'

    w8IsTab x = toEnum (fromIntegral x) == '\t'
GHCi> :set -XOverloadedStrings
GHCi> test "___\t___\t\t___\t\t\t___\n"
___ ___ ___ ___

Upvotes: 1

ErikR
ErikR

Reputation: 52039

Try this:

{-# LANGUAGE OverloadedStrings #-}

import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as PB
import           Data.ByteString (ByteString)
import Control.Lens hiding (each)

cleanTabs p = do
  p1 <- view (PB.span (/= 9)) p
  x <- lift $ next p1
  case x of
    Left r -> return r
    Right (a, p2) -> do
      yield "\t"
      let p3 = PB.dropWhile (== 9) (yield a >> p2)
      cleanTabs p3

source :: Monad m => Producer ByteString m ()
source = each [ "this", "is\t an", "\t\texample\t", "\t.", "\t\tmiddle\t", "\there"]

example = do
  putStrLn $ "input: " ++ (show $ P.toList source)
  putStrLn $ "output:" ++ (show $ P.toList (cleanTabs source))

Upvotes: 2

ErikR
ErikR

Reputation: 52039

Here is a solution which doesn't use PB.break but just uses basic pipe operations. One of the problem is that the data comes in chunks, and you have to keep track of whether or not the last chunk ended in a tab or not:

{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE OverloadedStrings #-}

module Lib5
where

import Pipes
import Pipes.ByteString
import qualified Data.ByteString.Char8 as BS
import Control.Monad

printAll = forever $ do a <- await; lift $ putStrLn $ "got: " ++ show a

endsWith bs ch = BS.length bs > 0 && BS.last bs == ch

convertDoubleTabs = await >>= go0

-- no precediing tab
go0 b = do
  let (pre,post) = BS.breakSubstring "\t\t" b
  yield pre
  if BS.length post == 0
    then if endsWith pre '\t'
            then await >>= go1
            else await >>= go0
    else do yield "\t"
            go0 (BS.drop 2 post)

-- last chunk ended in a tab
go1 b = do
  if BS.length b == 0
    then await >>= go1
    else if BS.index b 0 == '\t'
            then go0 (BS.drop 1 b)
            else go0 b

example1 = runEffect $ each [ "this", "is\t an", "\t\texample\t", "\t."] 
                       >-> convertDoubleTabs
                       >-> printAll

I'll add to this answer if I figure out a solution using the Pipes.ByteString and lenses.

To convert all consecutive tabs:

convertTabs = await >>= go0
  where
    go0 b = do
      let (pre,post) = BS.break (== '\t') b
      yield pre
      if BS.length post == 0
        then await >>= go0
        else do yield "\t"
                go1 post

    go1 b = do
      let b' = BS.dropWhile (== '\t') b
      if BS.null b'
        then await >>= go1
        else go0 b'

example2 = runEffect $ each [ "___\t___\t\t___\t\t\t___" ]
                       >-> convertTabs
                       >-> printAll

Upvotes: 0

Related Questions