I have a line from which I want to remove all words beginning with the symbol @ , I do not fully understand how to do it expressively. It is clear that you could write something like this: Split the string into words Use the list filter to weed out unnecessary words But I guess I don't understand how to break lines, because in addition to the space, there are such characters as \t and \n , besides, I will lose them and can not restore the original text. An example of what I want to get: original string: haha lala\n@delete_me all-ok expected result: haha lala\nall-ok

Reputation: 145

how to properly remove banned words?

I have a line from which I want to remove all words beginning with the symbol @, I do not fully understand how to do it expressively. It is clear that you could write something like this:

Split the string into words
Use the list filter to weed out unnecessary words

But I guess I don't understand how to break lines, because in addition to the space, there are such characters as \t and \n, besides, I will lose them and can not restore the original text.

An example of what I want to get:

original string:

haha lala\n@delete_me all-ok

expected result:

haha lala\nall-ok

Upvotes: 0

Answers (2)

Jon Purdy

Reputation: 55059

Another way to look at the problem is that we want to delete strings of non-spaces that begin with an at sign @, as well as any following spaces. We don’t want to treat line breaks or other characters specially at all. That can be expressed with a simple recursive function using span / break and dropWhile:

censor :: String -> String

censor "" = ""

censor text0 = spaces ++ nonspaces ++ censor rest
  where

    (spaces, text1) = span isSpace text0

    (word, text2) = break isSpace text1

    (nonspaces, rest)

      | banned word
      = ("", trim text2)

      | otherwise
      = (word, text2)

banned :: String -> Bool
banned ('@' : _) = True
banned _ = False

trim :: String -> String
trim = dropWhile isSpace

Consider an example:

censor " send @beans money to [email protected]"
span returns " " and "send @beans…"
break returns "send" and " @beans…"
banned returns false for "send", so we will keep it
We recursively call censor " @beans money…"
span returns " " and "@beans money…"
break returns "@beans" and " money…"
Now banned returns true for "@beans", so we drop it and trim the rest
We recursively call censor "money…"
We keep all the remaining substrings, including [email protected], since it is not banned
Finally, we reach the end of the string and censor "" returns ""

The end result is this expression:

"  " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "[email protected]" ++ ""

Notice that we use a series of updates to the input string resulting in a series of variables text0, text1, text2, rest for the intermediate states. Consider how you could express this pattern using State instead.

Upvotes: 1

snak

Reputation: 6703

You might want to use Data.List.Split.split with Data.List.Split.oneOf.

It returns split words including separators, so you can rebuild text with them.

split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]

Upvotes: 1

how to properly remove banned words?

Answers (2)

Related Questions