Reputation: 77
I have a string something like this:
"2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112"
I would like to split by pipe apart from anything wrapped in double quotes so I have something like (similar to how csv is done):
[0] => 2014-01-23 09:13:45
[1] => 10002112|TR0859657|25-DEC-2013>0000000000000001
[2] => 10002112
I would like to know if there is a regular expression that can do this?
Upvotes: 0
Views: 1556
Reputation: 8782
I think you may need to write your own parser.
Yo will need:
custom collection to keep results
boolean flag to decide whether pipe is inside quotation or outside quotation marks
string (or StringBuilder) to keep current word
The idea is that you read string char by char. Each char is appended to the word. If there is a pipe outside quotation marks you add the word to your result collection. If there is a quote you switch a flag so you don't treat the pipe as a divider anymore but you append it as a part of the word. Then if there is another quotation you switch the flag back again. So next pipe will result in adding the whole word (with pipes within quotation marks) to the collection. I tested the code below on your example and it worked.
private static List<string> ParseLine(string yourString)
{
bool ignorePipe = false;
string word = string.Empty;
List<string> divided = new List<string>();
foreach (char c in yourString)
{
if (c == '|' &&
!ignorePipe)
{
divided.Add(word);
word = string.Empty;
}
else if (c == '"')
{
ignorePipe = !ignorePipe;
}
else
{
word += c;
}
}
divided.Add(word);
return divided;
}
Upvotes: 2
Reputation: 17510
I'm going to blatantly ignore the fact that you want a RegEx, because I think that making your own IEnumerable will be easier. Plus, you get instant access to Linq.
var line = "2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112";
var data = GetPartsFromLine(line).ToList();
private static IEnumerable<string> GetPartsFromLine(string line)
{
int position = -1;
while (position < line.Length)
{
position++;
if (line[position] == '"')
{
//go find the next "
int endQuote = line.IndexOf('"', position + 1);
yield return line.Substring(position + 1, endQuote - position - 1);
position = endQuote;
if (position < line.Length && line[position + 1] == '|')
{
position++;
}
}
else
{
//go find the next |
int pipe = line.IndexOf('|', position + 1);
if (pipe == -1)
{
//hit the end of the line
yield return line.Substring(position);
position = line.Length;
}
else
{
yield return line.Substring(position, pipe - position);
position = pipe;
}
}
}
}
This hasn't been fully tested, but it works with your example.
Upvotes: 0
Reputation: 20014
How about this Regular Expression:
/((["|]).*\2)/g
It looks like it could be used as valid split expression.
Upvotes: 0