Reputation: 645
I have the following input string:
key1 = "test string1" ; key2 = "test string 2"
I need to convert it to the following without tokenizing
key1="test string1";key2="test string 2"
Upvotes: 4
Views: 601
Reputation: 3879
Using ERE, i.e. extended regular expressions (which are more clear than basic RE in such cases), assuming no quote escaping and having global flag (to replace all occurrences) you can do it this way:
s/ *([^ "]*) *("[^"]*")?/\1\2/g
sed:
$ echo 'key1 = "test string1" ; key2 = "test string 2"' | sed -r 's/ *([^ "]*) *("[^"]*")/\1\2/g'
C# code:
using System.Text.RegularExpressions;
Regex regex = new Regex(" *([^ \"]*) *(\"[^\"]*\")?");
String input = "key1 = \"test string1\" ; key2 = \"test string 2\"";
String output = regex.Replace(input, "$1$2");
Console.WriteLine(output);
Output:
key1="test string1";key2="test string 2"
Escape-aware version
On second thought I've reached a conclusion that not showing escape-aware version of regexp may lead to incorrect findings, so here it is:
s/ *([^ "]*) *("([^\\"]|\\.)*")?/\1\2/g
which in C# looks like:
Regex regex = new Regex(" *([^ \"]*) *(\"(?:[^\\\\\"]|\\\\.)*\")?");
String output = regex.Replace(input, "$1$2");
Please do not go blind from those backslashes!
Example
Input: key1 = "test \\ " " string1" ; key2 = "test \" string 2"
Output: key1="test \\ "" string1";key2="test \" string 2"
Upvotes: 2
Reputation: 14113
You'd be far better off NOT using a regular expression.
What you should be doing is parsing the string. The problem you've described is a mini-language, since each point in that string has a state (eg "in a quoted string", "in the key part", "assignment").
For example, what happens when you decide you want to escape characters?
key1="this is a \"quoted\" string"
Move along the string character by character, maintaining and changing state as you go. Depending on the state, you can either emit or omit the character you've just read.
As a bonus, you'll get the ability to detect syntax errors.
Upvotes: 5