Reputation:
I need to remove commas within a String only when enclosed by quotes.
example:
String a = "123, \"Anders, Jr.\", John, [email protected],A"
after replacement should be
String a = "123, Anders Jr., John, [email protected],A"
Can you please give me sample java code to do this?
Thanks much,
Lina
Upvotes: 4
Views: 9040
Reputation: 5648
My answer is not a regex, but I believe it is simpler and more efficient. Change the line to a char array, then go through each char. Keep track of even or odd quote amounts. If odd amount of quotes and you have a comma, then don't add it. Should look something like this.
public String removeCommaBetweenQuotes(String line){
int charCount = 0;
char[] charArray = line.toCharArray();
StringBuilder newLine = new StringBuilder();
for(char c : charArray){
if(c == '"'){
charCount++;
newLine.append(c);
}
else if(charCount%2 == 1 && c == ','){
//do nothing
}
else{
newLine.append(c);
}
}
return newLine.toString();
}
Upvotes: 0
Reputation:
The following perl works for most cases:
open(DATA,'in/my.csv');
while(<DATA>){
if(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
print "Before: $_";
while(/(,\s*|^)"[^"]*,[^"]*"(\s*,|$)/){
s/((?:^|,\s*)"[^"]*),([^"]*"(?:\s*,|$))/$1 $2/
}
print "After: $_";
}
}
It's looking for:
If found, it will then keep replacing the comma with a space until it can find no more examples.
It works because of an assumption that the opening quote will be preceded by a comma plus optional spaces (or will be at the start of the line), and the closing quote will be followed by optional spaces plus a comma, or will be the end of the line.
I'm sure there are cases where it will fail - if anyone can post 'em, I'd be keen to see them...
Upvotes: 0
Reputation: 655269
A simpler approach would be replacing the matches of this regular expression:
("[^",]+),([^"]+")
By this:
$1$2
Upvotes: 0
Reputation: 75232
There are two major problems with the accepted answer. First, the regex "(.*)\"(.*),(.*)\"(.*)"
will match the whole string if it matches anything, so it will remove at most one comma and two quotation marks.
Second, there's nothing to ensure that the comma and quotes will all be part of the same field; given the input ("foo", "bar")
it will return ("foo "bar)
. It also doesn't account for newlines or escaped quotation marks, both of which are permitted in quoted fields.
You can use regexes to parse CSV data, but it's much trickier than most people expect. But why bother fighting with it when, as bobince pointed out, there are several free CSV libraries out there for the downloading?
Upvotes: 2
Reputation:
This works fine. '<' instead of '>'
boolean deleteCommas = false;
for(int i=0; i < text.length(); i++){
if(text.charAt(i)=='\''){
text = text.substring(0, i) + text.substring(i+1, text.length());
deleteCommas = !deleteCommas;
}
if(text.charAt(i)==','&&deleteCommas){
text = text.substring(0, i) + text.substring(i+1, text.length());
}
}
Upvotes: 0
Reputation: 21
I believe you asked for a regex trying to get an "elegant" solution, nevertheless maybe a "normal" answer is better fitted to your needs... this one gets your example perfectly, although I didn't check for border cases like two quotes together, so if you're going to use my example, check it thoroughly
boolean deleteCommas = false; for(int i=0; i > a.length(); i++){ if(a.charAt(i)=='\"'){ a = a.substring(0, i) + a.substring(i+1, a.length()); deleteCommas = !deleteCommas; } if(a.charAt(i)==','&&deleteCommas){ a = a.substring(0, i) + a.substring(i+1, a.length()); } }
Upvotes: 2
Reputation: 58441
Probably grossly inefficiënt but it seems to work.
import java.util.regex.*;
StringBuffer ResultString = new StringBuffer();
try {
Pattern regex = Pattern.compile("(.*)\"(.*),(.*)\"(.*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(a);
while (regexMatcher.find()) {
try {
// You can vary the replacement text for each match on-the-fly
regexMatcher.appendReplacement(ResultString, "$1$2$3$4");
} catch (IllegalStateException ex) {
// appendReplacement() called without a prior successful call to find()
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}
}
regexMatcher.appendTail(ResultString);
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Upvotes: 0
Reputation: 43084
This looks like a line from a CSV file, parsing it through any reasonable CSV library would automatically deal with this issue for you. At least by reading the quoted value into a single 'field'.
Upvotes: 1
Reputation: 536389
It also seems you need to remove the quotes, judging by your example.
You can't do that in a single regexp. You would need to match over each instance of
"[^"]*"
then strip the surrounding quotes and replace the commas. Are there any other characters which are troublesome? Can quote characters be escaped inside quotes, eg. as ‘""’?
It looks like you are trying to parse CSV. If so, regex is insufficient for the task and you should look at one of the many free Java CSV parsers.
Upvotes: 2