levant pied
levant pied

Reputation: 4481

Get all string splits

Say I have a string:

"abc7de7f77ghij7"

I want to split it by a substring, 7 in this case, and get all the left-right splits:

[ ("abc", "de7f77ghij7")
, ("abc7de", "f77ghij7")
, ("abc7de7f", "7ghij7")
, ("abc7de7f7", "ghij7")
, ("abc7de7f77ghij", "")
]

Sample implementation:

{-# LANGUAGE OverloadedStrings #-}

module StrSplits where

import qualified Data.Text as T

splits :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits d s =
  let run a l r  =
        case T.breakOn d r of
          (x, "") -> reverse a
          (x, y)  ->
            let
                rn = T.drop (T.length d) y
                an = (T.append l x, rn) : a
                ln = l `T.append` x `T.append` d
            in run an ln rn
  in run [] "" s

main = do
  print $ splits "7" "abc7de7f77ghij7"
  print $ splits "8" "abc7de7f77ghij7"

with expected result:

[("abc","de7f77ghij7"),("abc7de","f77ghij7"),("abc7de7f","7ghij7"),("abc7de7f7","ghij7"),("abc7de7f77ghij","")]
[]

I'm not too happy about the manual recursion and let/case/let nesting. If my feeling that it doesn't look too good is right, is there a better way to write it?

Is there a generalized approach to solving these kinds of problems in Haskell similar to how recursion can be replaced with fmap and folds?

Upvotes: 2

Views: 93

Answers (2)

Here's an indexless one.

import Data.List (isPrefixOf, unfoldr)

type ListZipper a = ([a],[a])

moveRight :: ListZipper a -> Maybe (ListZipper a)
moveRight (_, []) = Nothing
moveRight (ls, r:rs) = Just (r:ls, rs)

-- As Data.List.iterate, but generates a finite list ended by Nothing.
unfoldr' :: (a -> Maybe a) -> a -> [a]
unfoldr' f = unfoldr (\x -> (,) x <$> f x)

-- Get all ways to split a list with nonempty suffix
-- Prefix is reversed for efficiency
-- [1,2,3] -> [([],[1,2,3]), ([1],[2,3]), ([2,1],[3])]
splits :: [a] -> [([a],[a])]
splits xs = unfoldr' moveRight ([], xs)

-- This is the function you want.
splitsOn :: (Eq a) => [a] -> [a] -> [([a],[a])]
splitsOn sub xs = [(reverse l, drop (length sub) r) | (l, r) <- splits xs, sub `isPrefixOf` r]

Try it online!

Basically, traverse a list zipper to come up with a list of candidates for the split. Keep only those that are indeed splits on the desired item, then (un)reverse the prefix portion of each passing candidate.

Upvotes: 1

cole
cole

Reputation: 725

How about this?

import Data.Bifunctor (bimap)

splits' :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits' delimiter string = mkSplit <$> [1..numSplits]
  where
    sections  = T.splitOn delimiter string
    numSplits = length sections - 1
    mkSplit n = bimap (T.intercalate delimiter) (T.intercalate delimiter) $ splitAt n sections

I like to believe there's a way that doesn't involve indices, but you get the general idea. First split the string by the delimiter. Then split that list of strings at in two everywhere possible, rejoining each side with the delimiter.

Not the most efficient, though. You can probably do something similar with indices from Data.Text.Internal.Search if you want it to be fast. In this case, you wouldn't need to do the additional rejoining. I didn't experiment with it since I didn't understand what the function was returning.

Upvotes: 1

Related Questions