Lina Vuppala
Lina Vuppala

Reputation:

Regex for removing comma in a String when it is enclosed by quotes

I need to remove commas within a String only when enclosed by quotes.

example:

String a = "123, \"Anders, Jr.\", John, [email protected],A"

after replacement should be

String a = "123, Anders Jr., John, [email protected],A"

Can you please give me sample java code to do this?

Thanks much,

Lina

Upvotes: 4

Views: 9040

Answers (10)

applecrusher
applecrusher

Reputation: 5648

My answer is not a regex, but I believe it is simpler and more efficient. Change the line to a char array, then go through each char. Keep track of even or odd quote amounts. If odd amount of quotes and you have a comma, then don't add it. Should look something like this.

public String removeCommaBetweenQuotes(String line){


    int charCount = 0;
    char[] charArray = line.toCharArray();
    StringBuilder newLine = new StringBuilder();

    for(char c : charArray){

        if(c == '"'){
            charCount++;
            newLine.append(c);
        }

        else if(charCount%2 == 1 && c == ','){
            //do nothing
        }

        else{
            newLine.append(c);
        }


    }

    return newLine.toString();


}

Upvotes: 0

Tom Melly
Tom Melly

Reputation:

The following perl works for most cases:

open(DATA,'in/my.csv');
while(<DATA>){
  if(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
    print "Before: $_";
    while(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
      s/((?:^|,\s*)"[^"]*),([^"]*"(?:\s*,|$))/$1 $2/
    }
    print "After: $_";
  }
}

It's looking for:

  • (comma plus optional spaces) or start of line
  • a quote
  • 0 or more non-quotes
  • a comma
  • 0 or more non-quotes
  • (optional spaces plus comma) or end of line

If found, it will then keep replacing the comma with a space until it can find no more examples.

It works because of an assumption that the opening quote will be preceded by a comma plus optional spaces (or will be at the start of the line), and the closing quote will be followed by optional spaces plus a comma, or will be the end of the line.

I'm sure there are cases where it will fail - if anyone can post 'em, I'd be keen to see them...

Upvotes: 0

Gumbo
Gumbo

Reputation: 655269

A simpler approach would be replacing the matches of this regular expression:

("[^",]+),([^"]+")

By this:

$1$2

Upvotes: 0

Alan Moore
Alan Moore

Reputation: 75232

There are two major problems with the accepted answer. First, the regex "(.*)\"(.*),(.*)\"(.*)" will match the whole string if it matches anything, so it will remove at most one comma and two quotation marks.

Second, there's nothing to ensure that the comma and quotes will all be part of the same field; given the input ("foo", "bar") it will return ("foo "bar). It also doesn't account for newlines or escaped quotation marks, both of which are permitted in quoted fields.

You can use regexes to parse CSV data, but it's much trickier than most people expect. But why bother fighting with it when, as bobince pointed out, there are several free CSV libraries out there for the downloading?

Upvotes: 2

aavaliani
aavaliani

Reputation:

This works fine. '<' instead of '>'

boolean deleteCommas = false;
for(int i=0; i < text.length(); i++){
    if(text.charAt(i)=='\''){
        text = text.substring(0, i) + text.substring(i+1, text.length());
        deleteCommas = !deleteCommas;
    }
    if(text.charAt(i)==','&&deleteCommas){
        text = text.substring(0, i) + text.substring(i+1, text.length());
    }
}

Upvotes: 0

Yorch
Yorch

Reputation: 21

I believe you asked for a regex trying to get an "elegant" solution, nevertheless maybe a "normal" answer is better fitted to your needs... this one gets your example perfectly, although I didn't check for border cases like two quotes together, so if you're going to use my example, check it thoroughly

boolean deleteCommas = false;
for(int i=0; i > a.length(); i++){
    if(a.charAt(i)=='\"'){
        a = a.substring(0, i) + a.substring(i+1, a.length());
        deleteCommas = !deleteCommas;
    }
    if(a.charAt(i)==','&&deleteCommas){
        a = a.substring(0, i) + a.substring(i+1, a.length());
    }
}

Upvotes: 2

Lieven Keersmaekers
Lieven Keersmaekers

Reputation: 58441

Probably grossly inefficiënt but it seems to work.

import java.util.regex.*;

StringBuffer ResultString = new StringBuffer();

try {
    Pattern regex = Pattern.compile("(.*)\"(.*),(.*)\"(.*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(a);
    while (regexMatcher.find()) {
        try {
            // You can vary the replacement text for each match on-the-fly
            regexMatcher.appendReplacement(ResultString, "$1$2$3$4");
        } catch (IllegalStateException ex) {
            // appendReplacement() called without a prior successful call to find()
        } catch (IllegalArgumentException ex) {
            // Syntax error in the replacement text (unescaped $ signs?)
        } catch (IndexOutOfBoundsException ex) {
            // Non-existent backreference used the replacement text
        } 
    }
    regexMatcher.appendTail(ResultString);
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Upvotes: 0

Lazarus
Lazarus

Reputation: 43084

This looks like a line from a CSV file, parsing it through any reasonable CSV library would automatically deal with this issue for you. At least by reading the quoted value into a single 'field'.

Upvotes: 1

strager
strager

Reputation: 90022

Should work:

s/(?<="[^"]*),(?=[^"]*")//g
s/"//g

Upvotes: 1

bobince
bobince

Reputation: 536389

It also seems you need to remove the quotes, judging by your example.

You can't do that in a single regexp. You would need to match over each instance of

"[^"]*"

then strip the surrounding quotes and replace the commas. Are there any other characters which are troublesome? Can quote characters be escaped inside quotes, eg. as ‘""’?

It looks like you are trying to parse CSV. If so, regex is insufficient for the task and you should look at one of the many free Java CSV parsers.

Upvotes: 2

Related Questions