Reputation: 965
actually I need to write just a simple program in JAVA to convert MySQL INSERTS lines into CSV files (each mysql table equals one CSV file)
is the best solution to use regex in JAVA?
My main problem how to match correctly value like this: 'this is \'cool\'...' (how to ignore escaped ')
example:
INSERT INTO `table1` VALUES ('this is \'cool\'...' ,'some2');
INSERT INTO `table1` (`field1`,`field2`) VALUES ('this is \'cool\'...' ,'some2');
Thanks
Upvotes: 0
Views: 384
Reputation: 336208
Assuming that your SQL statements are syntactically valid, you could use
Pattern regex = Pattern.compile("'(?:\\\\.|[^'\\\\])*'");
to get a regex that matches all single-quoted strings, ignoring escaped characters inside them.
Explanation without all those extra backslashes:
' # Match '
(?: # Either match...
\\. # an escaped character
| # or
[^'\\] # any character except ' or \
)* # any number of times.
' # Match '
Given the string
'this', 'is a \' valid', 'string\\', 'even \\\' with', 'escaped quotes.\\\''
this matches
'this'
'is a \' valid'
'string\\'
'even \\\' with'
'escaped quotes.\\\''
Upvotes: 3
Reputation: 1487
You have to use \\\\
. In Java Strings \\
is one \
, because the backslash is used to do whitespace or control characters (\n
,\t
, ...). But in regex a backslash is also represented by '\'.
Upvotes: 0
Reputation: 32959
Although regexes give you a very powerful mechanism to parse text, I think you might be better off with a non-regex parser. I think you code will be easier to write, easier to understand and have fewer bugs.
Something like:
Writing the regex to do all of the above, with optional column values and an optional number of value sets is non-trivial and error-prone.
Upvotes: 0
Reputation: 425073
You can match on chars within non-escaped quotes by using this regex:
(?<!\\)'([^'])(?<!\\)`
This is using a negative look-behind to assert that the character before the quote is not a bask-slash.
In jave, you have to double-escape (once for the String, once for the regex), so it looks like:
String regex = "(?<!\\\\)'([^'])(?<!\\\\)`";
If you are working in linux, I would be using sed
to do all the work.
Upvotes: 1
Reputation: 109567
Four backslashes (two to represent a backslash) plus dot. "'(\\\\.|.)*'"
Upvotes: 0