mysticfalls
mysticfalls

Reputation: 455

Delete text within quotes

I want to remove strings within the double quotes or single quotes or backticks along with the enclosing characters.

Input is:

Lorem ipsum "'dolor sit amet consectetur'" adipiscing "elite"  ellentesque 

scelerisque 'tortor' tortor in `vestibulum` dolor

Expected output:

Lorem ipsum adipiscing ellentesque scelerisque tortor in dolor

I have this code, but there is no change in the result. Could anyone tell me what is wrong with my code?

line.replaceAll("[\'\"\\`].*[\'\"\\`]$", "");

Upvotes: 4

Views: 2283

Answers (5)

jancha
jancha

Reputation: 4977

l=line;
l=l.replaceAll("\"[^\"]+\"","");
l=l.replaceAll("'[^\"]+'","");
l=l.replaceAll("`[^\"]+`","");

explain:

  1. " - start a string with "
  2. [^"]+ - find at least one chart that is not "
  3. " - find a closing "

same for ' and `

Upvotes: 1

The Guy with The Hat
The Guy with The Hat

Reputation: 11132

There are three problems with your regex.

  1. It matches text from any one of "'` to any one of "'`, not necessarily the same one that started the match.
  2. * is greedy, meaning it will match text from the first ", ', or ` to the very last one in the line.
  3. Because your regex ends with $, it will only match text if that text ends with the end of the entire string.

You can try it this way:

sb.append(line.replaceAll("(?:([\"'`])[^\\1]*?\\1)\\s+|\r?\n", ""));

Input:

Lorem ipsum "'dolor sit amet consectetur'" adipiscing "elite"  ellentesque 

scelerisque 'tortor' tortor in `vestibulum` dolor

Output:

Lorem ipsum adipiscing ellentesque scelerisque tortor in dolor

There is an explanation and demonstration of that regex here: http://regex101.com/r/iK3fQ8

Upvotes: 3

SebastianH
SebastianH

Reputation: 2182

For better readability of your code I would split this into several regexps:

line = line.replaceAll("\".*?\"", "");
line = line.replaceAll("'.*?'", "");
line = line.replaceAll("`.*?`", "");

(untested, there might be another espacing necessary)

Upvotes: 1

aelor
aelor

Reputation: 11126

like this may be:

\".*?\"|\'.*?\'|`.*`

demo here : http://regex101.com/r/lB4xS2

Upvotes: 2

Amit Joki
Amit Joki

Reputation: 59292

Change your greedy matcher .* to .+?(non greedy).

And assign the replaced value.

Full code:

line = line.replaceAll("([\'\"\\`]).+?\1", "");

Thanks tobias_k for pointing out that I could use backreference.

Also check for java's escaping rules and escape accordingly.

Upvotes: 1

Related Questions