AL̲̳I
AL̲̳I

Reputation: 2461

Remove from string an element of a list or array

I have a list or an array of string

String [] elements = {"cat", "dog", "fish"};

and a string

String str = "This is a caterpillar and that is a dogger.";

I want to remove all the items of the array/list from the string if any exists in the string. so that the function should return a string

str = "This is a erpillar and that is a ger." (cat and dog removed from the string)

I can do something like this

private String removeElementsFromString (String str, String [] elements) {
        if(Arrays.stream(elements).anyMatch(str::contains)){
            for(String item : elements){
                str = str.replace(item, "");
            }
        }
        return str;
    }

but what is the elegant way to change the for loop to something else.

Upvotes: 3

Views: 469

Answers (6)

josejuan
josejuan

Reputation: 9566

Arrays.stream(elements).reduce(str, (r, w) -> r.replace(w, ""))

with the expected output.

If you want to reduce the input string until it is no longer possible, it is best to iterate until there are no changes

String n = str, o = null;
do {
    n = stream(elements).reduce(o = n, (r, w) -> r.replace(w, ""));
} while(!n.equals(o));

System.out.println(n);

then, with input string

This is a caterpillar and that is a docatg.

you'll get

This is a erpillar and that is a .

If really want a fast algorithm use Aho-Corasick with cost O(n)

    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : Trie.builder().addKeywords(elements).build().parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    System.out.println(sb.toString());

Aside solution performance comparison (with Oussama ZAGHDOUD's solution):

Equals = true       // check all output are equals
Time1 = 18,548822   // Oussama ZAGHDOUD's solution O(n^2)
Time2 = 0,134459    // Aho-Corasick O(n) without precompute Trie
Time3 = 0,065056    // Aho-Corasick O(n) precomputed Trie

full bench code

static String alg1(String[] elements, String str) {
    StringBuilder bf = new StringBuilder(str);
    str =null;
    Stream.of(elements).forEach(e -> {
        int index = bf.indexOf(e);
        while (index != -1) {
            index = bf.indexOf(e);
            if (index != -1) {
                bf.delete(index, index + e.length());
            }
        }
    });
    return bf.toString();
}

static String alg2(String[] elements, String str) {
    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : Trie.builder().addKeywords(elements).build().parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    return sb.toString();
}

static String alg3(Trie trie, String str) {
    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : trie.parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    return sb.toString();
}

public static void main(String... args) throws JsonProcessingException {

    final ThreadLocalRandom rnd = ThreadLocalRandom.current();

    // test, use random numbers as words
    String[] elements = range(0, 1_000).mapToObj(i -> "w" + rnd.nextInt()).toArray(String[]::new);

    // intercalate random elements word with other random word
    String str = range(0, 100_000)
            .mapToObj(i -> "z" + rnd.nextInt() + " " + elements[rnd.nextInt(elements.length)])
            .collect(joining(", "));

    Trie trie = Trie.builder().addKeywords(elements).build();

    long t0 = System.nanoTime();
    String s1 = alg1(elements, str);
    long t1 = System.nanoTime();
    String s2 = alg2(elements, str);
    long t2 = System.nanoTime();
    String s3 = alg3(trie, str);
    long t3 = System.nanoTime();

    System.out.printf("Equals = %s%nTime1 = %f%nTime2 = %f%nTime3 = %f%n",
            s1.equals(s2) && s2.equals(s3), (t1 - t0) * 1e-9, (t2 - t1) * 1e-9, (t3 - t2) * 1e-9);
}

Upvotes: 2

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79015

One-liner solution

The following one-liner does the job:

str = str.replaceAll(Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")), "");

Demo:

import java.util.Arrays;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        String[] elements = { "cat", "dog", "fish" };
        String str = "This is a caterpillar and that is a dogger.";
        
        str = str.replaceAll(Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")), "");

        System.out.println(str);
    }
}

Output:

This is a erpillar and that is a ger.

ONLINE DEMO

Explanation:

Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")) results into the regex, (?:cat)|(?:dog)|(?:fish) which means cat or dog or fish.

The next step is to replace this resulting regex by "".

Upvotes: 6

Oussama ZAGHDOUD
Oussama ZAGHDOUD

Reputation: 2159

Another Solution with StringBuilder :

because it is much faster and consumes less memory.

I think that using StringBuilder instead of String is more appropriate here:

import java.io.IOException;
import java.util.stream.Stream;

public class Bounder {

public static void main(String[] args) throws IOException {
    String[] elements = { "cat", "dog", "fish" };
    String str = "This is a catcatcatcatcatcatcaterpillar ancatcatcatcatd thcatcatcatat is a dogdogdogdogdogdogger.";
// Use StringBuilder here instead of String     
StringBuilder bf = new StringBuilder(str);
    str =null;

    System.out.println("Original String   =  " + bf.toString());
    Stream.of(elements).forEach(e -> {
        int index = bf.indexOf(e);
        while (index != -1) {
            index = bf.indexOf(e);
            if (index != -1) {
                bf.delete(index, index + e.length());
            }
        }
    });

    System.out.println("Result            =  " + bf.toString());
}
}

Output :

  Original String   =  This is a catcatcatcatcatcatcaterpillar ancatcatcatcatd thcatcatcatat is a dogdogdogdogdogdogger.

  Result            =  This is a erpillar and that is a ger.

Upvotes: 3

WJS
WJS

Reputation: 40034

You can do it like this. Just use a simple loop.

for (String word : elements) {
            str = str.replace(word,"");
}

Upvotes: 0

Youcef LAIDANI
Youcef LAIDANI

Reputation: 59950

I would simply use:

private String removeElementsFromString(String str, String[] elements) {
    for (String item : elements) {
        str = str.replace(item, "");
    }
    return str;
}

I don't see any advantage of the first condition:

if(Arrays.stream(elements).anyMatch(str::contains)) {

Upvotes: 1

Andy Turner
Andy Turner

Reputation: 140309

The most concise way would be to use replaceAll, which accepts a regular expression as the first parameter:

String newStr = str.replaceAll(String.join("|", elements), "");

This only works because the things in elements have no special regex characters. If any of them did (or there was a chance they did), you'd have to quote them:

String pattern = Arrays.stream(elements).map(Pattern::quote).collect(Collectors.joining("|"));

Note, however, that this would operate in a single pass. So if you had a string like:

docatg

this approach would result in dog, whereas an approach which does input.replace("cat", "").replace("dog", "") would remove the dog as well.

Upvotes: 0

Related Questions