Marek Javůrek
Marek Javůrek

Reputation: 965

java regex string split by " not \"

actually I need to write just a simple program in JAVA to convert MySQL INSERTS lines into CSV files (each mysql table equals one CSV file)

is the best solution to use regex in JAVA?

My main problem how to match correctly value like this: 'this is \'cool\'...' (how to ignore escaped ')

example:

INSERT INTO `table1` VALUES ('this is \'cool\'...' ,'some2');
INSERT INTO `table1` (`field1`,`field2`) VALUES ('this is \'cool\'...' ,'some2');

Thanks

Upvotes: 0

Views: 384

Answers (5)

Tim Pietzcker
Tim Pietzcker

Reputation: 336208

Assuming that your SQL statements are syntactically valid, you could use

Pattern regex = Pattern.compile("'(?:\\\\.|[^'\\\\])*'");

to get a regex that matches all single-quoted strings, ignoring escaped characters inside them.

Explanation without all those extra backslashes:

'         # Match '
(?:       # Either match...
 \\.      # an escaped character
|         # or
 [^'\\]   # any character except ' or \
)*        # any number of times.
'         # Match '

Given the string

'this', 'is a \' valid', 'string\\', 'even \\\' with', 'escaped quotes.\\\''

this matches

'this'
'is a \' valid'
'string\\'
'even \\\' with'
'escaped quotes.\\\''

Upvotes: 3

Tilman Schweitzer
Tilman Schweitzer

Reputation: 1487

You have to use \\\\. In Java Strings \\is one \, because the backslash is used to do whitespace or control characters (\n,\t, ...). But in regex a backslash is also represented by '\'.

Upvotes: 0

John B
John B

Reputation: 32959

Although regexes give you a very powerful mechanism to parse text, I think you might be better off with a non-regex parser. I think you code will be easier to write, easier to understand and have fewer bugs.

Something like:

  • find "INSERT INTO"
  • find table name
  • find column names
  • find "VALUES"
  • find value set (loop this part)

Writing the regex to do all of the above, with optional column values and an optional number of value sets is non-trivial and error-prone.

Upvotes: 0

Bohemian
Bohemian

Reputation: 425073

You can match on chars within non-escaped quotes by using this regex:

(?<!\\)'([^'])(?<!\\)`

This is using a negative look-behind to assert that the character before the quote is not a bask-slash.

In jave, you have to double-escape (once for the String, once for the regex), so it looks like:

String regex = "(?<!\\\\)'([^'])(?<!\\\\)`";

If you are working in linux, I would be using sed to do all the work.

Upvotes: 1

Joop Eggen
Joop Eggen

Reputation: 109567

Four backslashes (two to represent a backslash) plus dot. "'(\\\\.|.)*'"

Upvotes: 0

Related Questions