Alexei
Alexei

Reputation: 347

How would you count occurences of a string within a string (Haskell)

Is there some Library in Haskell that has similar function like this?

> function "bal bla hu bla" ["bla","bal"]
[(2,"bla"),(1,"bal")]

Upvotes: 2

Views: 2021

Answers (5)

Using Data.List.Split:

occs ∷ Eq a ⇒ [a] → [[a]] → [(Int, [a])]
occs str = map (count str &&& id)
  where
    count s x = length (splitOn x s) - 1

And

 >occs "bal bla hu bla" ["bla","bal"]
 [(2,"bla"),(1,"bal")]

UPD:

Parsec can be useful here too.

{-# LANGUAGE NoMonomorphismRestriction,
             FlexibleContexts
  #-}
import Control.Arrow ((&&&))
import Data.Either (partitionEithers)
import Text.Parsec

occs :: String -> [String] -> [(Int, String)]
occs s = map (countP s &&& id)

countP str substr = either (const 0) occsNumber $ parse (parseMany substr) "" str
  where
    occsNumber = length . snd . partitionEithers

parseSingle :: Stream s m Char => String -> ParsecT s u m (Either Char String)
parseSingle s = fmap Right (try (string s)) <|> fmap Left anyChar

parseMany :: Stream s m Char => String -> ParsecT s u m [Either Char String]
parseMany = many . parseSingle

Result is still the same:

> occs "bal bla hu bla" ["bla","bal"]
[(2,"bla"),(1,"bal")]

Upvotes: 3

Rotsor
Rotsor

Reputation: 13783

OMG OMG! Every solution given so far has quadratic computational complexity! (even the Data.Text one has a worst-case quadratic running time)

Obviously you need to roll your own string search algorithm!

Here is my take. I think this is a variation of KMP.

data Searcher = Found | Initial Searcher | Searching Char Searcher Searcher

runSearcher :: Searcher -> Char -> Searcher
runSearcher (Searching c suc fail) s | c == s = suc
 | otherwise = runSearcher fail s
runSearcher (Initial s) _ = s

mkSearcher pattern = initial where
  initial = go (Initial initial) pattern

  go fallback [] = Found
  go fallback (c:t) = Searching c (go (runSearcher fallback c) t) fallback

search :: String -> String -> Integer
search pat = go searcher where
  searcher = mkSearcher pat
  go Found s = 1 + go searcher s
  go src (c:t) = go (runSearcher src c) t
  go src [] = 0

Still, there is much space for improvement! Searching for multiple patterns can be done more efficiently than doing that one by one if we preprocess the input string by building a prefix tree or something like that...

Upvotes: 1

Landei
Landei

Reputation: 54574

The "train wreck" solution:

import Data.List

f txt ws = map freq $ filter isElem $ group $ sort $ words txt where
    isElem (w:_) = w `elem` ws
    freq xs@(x:_) = (length xs, x)

From there, you can go "monadic"

import Data.List
import Control.Monad

f txt ws = do
    xs@(x:_) <- group $ sort $ words txt 
    guard $ x `elem` ws
    return (length xs, x)

Upvotes: 0

dave4420
dave4420

Reputation: 47052

import Control.Arrow ((&&&))
import Data.List (isPrefixOf, tails)

yourFunction :: Eq a => [a] -> [[a]] -> [(Int, [a])]
yourFunction haystack = map (count &&& id)
  where count needle = length . filter (needle `isPrefixOf`) . tails $ haystack

(Untested.)

Upvotes: 2

huon
huon

Reputation: 102066

Data.Text provides a count function, however it works on Text rather than String so we have to use pack.

import Data.Text (pack, count)

function haystack needles = map go needles
     where 
        packed = pack haystack
        go needle = (count (pack needle) packed, needle)

Upvotes: 4

Related Questions