KettuJKL
KettuJKL

Reputation: 272

replace multiple bad characters in clojure

I am trying to replace bad characters from a input string. Characters should be valid UTF-8 characters (tabs, line breaks etc. are ok).

However I was unable to figure out how to replace all found bad characters.

My solution works for the first bad character.

Usually there are none bad characters. 1/50 cases there is one bad character. I'd just want to make my solution foolproof.

(defn filter-to-utf-8-string
  "Return only good utf-8 characters from the input."
  [input]
  (let [bad-characters (set (re-seq #"[^\p{L}\p{N}\s\p{P}\p{Sc}\+]+" input))
        filtered-string (clojure.string/replace input (apply str (first bad-characters)) "")]
    filtered-string))

How can I make replace work for all values in sequence not just for the first one?


Friend of mine helped me to find workaround for this problem: I created a filter for replace using re-pattern.

Within let code is currently

filter (if (not (empty? bad-characters))
          (re-pattern (str "[" (clojure.string/join bad-characters) "]"))
          #"")
filtered-string (clojure.string/replace input filter "")

Upvotes: 1

Views: 661

Answers (1)

Alan Thompson
Alan Thompson

Reputation: 29958

Here is a simple version:

(ns xxxxx
  (:require
    [clojure.string :as str]
  ))

(def all-chars (str/join (map char (range 32 80))))
(println all-chars)

(def char-L (str/join (re-seq #"[\p{L}]" all-chars)))
(println char-L)

(def char-N (str/join (re-seq #"[\p{N}]" all-chars)))
(println char-N)

(def char-LN (str/join (re-seq #"[\p{L}\p{N}]" all-chars)))
(println char-LN)

all-chars  => " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNO"
char-L     => "ABCDEFGHIJKLMNO"
char-N     => "0123456789"
char-LN    => "0123456789ABCDEFGHIJKLMNO"

So we start off with all ascii chars in the range of 32-80. We first print only the letter, then only the numbers, then either letters or numbers. It seems this should work for your problem, although instead of rejecting non-members of the desired set, we keep the members of the desired set.

Upvotes: 2

Related Questions