God
God

Reputation: 1248

Deleting all regex instances starting with char '[' and ending with char ']' from a String

I need to take a String and deleting all the regexes in it starting with character '[' and ending with character ']'.

Now i don't know how to tackle this problem. I tried to convert the String to character array and then putting empty characters from any starting '[' till his closing ']' and then convert it back to a String using toString() method.

MyCode:

char[] lyricsArray = lyricsParagraphElements.get(1).text().toCharArray();
                for (int i = 0;i < lyricsArray.length;i++)
                {
                    if (lyricsArray[i] == '[')
                    {
                        lyricsArray[i] = ' ';
                        for (int j = i + 1;j < lyricsArray.length;j++)
                        {
                            if (lyricsArray[j] == ']')
                            {
                                lyricsArray[j] = ' ';
                                i = j + 1;
                                break;
                            }
                            lyricsArray[j] = ' ';
                        }   
                    }
                }
                String songLyrics = lyricsArray.toString();
                System.out.println(songLyrics);

But in the print line of songLyrics i get weird stuff like

[C@71bc1ae4
[C@6ed3ef1
[C@2437c6dc
[C@1f89ab83
[C@e73f9ac
[C@61064425
[C@7b1d7fff
[C@299a06ac
[C@383534aa
[C@6bc168e5

I guess there is a simple method for it. Any help will be very appreciated.

For clarification: converting "abcd[dsadsadsa]efg[adf%@1]d" Into "abcdefgd"

Upvotes: 1

Views: 237

Answers (5)

cuong hoang
cuong hoang

Reputation: 104

This is exactly regex string for your case:

\\[([\\w\\%\\@]+)\\]

It's very hard when your plant string is contain special symbol. I can't find shorter regex, without explain special symbol like an exception. reference: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#cg

================

I'm read your new case, a string contain symbol "-" or something else in !"#$%&'()*+,-./:;<=>?@\^_`{|}~ add them (with prefix "\\") after \\@ on my regex string.

Upvotes: 1

Tim
Tim

Reputation: 5691

Or simply use a regular expression to replace all occurences of \\[.*\\] with nothing:

String songLyrics = text.replaceAll("\\[.*?\\]", "");

Where text is ofcourse:

String text = lyricsParagraphElements.get(1).text();

What does \\[.*\\] mean?

The first parameter of replaceAll is a string describing a regular expression. A regular expression defines a pattern to match in a string.

So let's split it up:

\\[ matches exactly the character [. Since [ has a special meaning within a regular expression, it needs to be escaped (twice!).

. matches any character, combine this with the (lazy) zero-or-more operator *?, and it will match any character until it finally finds:

\\], which matches the character ]. Note the escaping again.

Upvotes: 3

Andy Turner
Andy Turner

Reputation: 140484

You are getting "weird stuff" because you are printing the string representation of the array, not converting the array to a String.

Instead of lyricsArray.toString(), use

new String(lyricsArray);

But if you do this, you will find that you are not actually removing characters from the string, just replacing them with spaces.

Instead, you can shift all of the characters left in the array, and construct the new String only up to the right number of characters:

int src = 0, dst = 0;
while (src < lyricsArray.length) {
  while (src < lyricsArray.length && lyricsArray[src] != '[') {
    lyricsArray[dst++] = lyricsArray[src++];
  }
  if (src < lyricsArray.length) {
    ++src;
    while (src - 1 < lyricsArray.length && lyricsArray[src - 1] != ']') {
      src++;
    }
  }
}
String lyricsString = new String(lyricsArray, 0, dst);

Upvotes: 1

FallAndLearn
FallAndLearn

Reputation: 4135

Your code below is referencing to the string object and you are then printing the reference songLyrics.

String songLyrics = lyricsArray.toString();
System.out.println(songLyrics);

Replace above two lines with

String songLyrics = new String(lyricsArray);
System.out.println(songLyrics);

Ideone1

Other way without converting it into char array and again to string.

String lyricsParagraphElements = "asdasd[asd]";

String songLyrics = lyricsParagraphElements.replaceAll("\\[.*\\]", "");

System.out.println(songLyrics);

Ideone2

Upvotes: 2

Elliott Frisch
Elliott Frisch

Reputation: 201477

You're printing a char[] and Java char[] does not override toString(). And, a Java String is immutable, but Java does have StringBuilder which is mutable (and StringBuilder.delete(int, int) can remove arbitrary substrings). You could use it like,

String songLyrics = lyricsParagraphElements.get(1).text();
StringBuilder sb = new StringBuilder(songLyrics);
int p = 0;
while ((p = sb.indexOf("[", p)) >= 0) {
    int e = sb.indexOf("]", p + 1);
    if (e > p) {
        sb.delete(p, e + 1);
    }
    p++;
}
System.out.println(sb);

Upvotes: 1

Related Questions