Reputation: 31
I have a csv file contains columns with value '\\\n'
and '\\\t'
which is escaped new line and tab. However, i want to split each row into string array.
how to split specifically '\n'
but not '\\\n'
?
I am looking at Regex.Split is it right direction? I tried Regex.Split(input, @"[^\\]\n");
but the result seems correct but one character in front is always missing, supposedly is caused by [^\].
Upvotes: 2
Views: 305
Reputation: 612
Regex.Split(input, @"[^\\]\n");
The problem with the regex above is that square brackets match only one character, and what they match is considered part of the match itself, meaning the character directly preceding \n will be considered part of the split string and treated accordingly.
I think what you are looking for is a negative look-behind, which is used as follows:
(?<!DO NOT MATCH THIS)match
Look-behinds and look-aheads ensure that a match exists without including the matched text as part of your match.
I assume what you are looking for is something like this:
Regex.Split(input, @"(?<!\\)\n");
Hope that helps!
Upvotes: 1
Reputation: 101758
How about this:
(?<=^|^[^\\]|[^\\]{2})\\(n|t)
This will account for \n
s and \t
s that are at the beginning or second position of the input string
Upvotes: 0
Reputation: 5766
If you're parsing a CSV file, please try to use the TextFieldParser
thats already in the framework. It will save you the headache of dealing with all the specific problems that come up when parsing a delimited file.
As mentioned below, it's part of the Microsoft.VisualBasic.dll
, but this comes with the framework by default, you just need a reference. And even though it's called VisualBasic
, it's in no way VB specific.
Upvotes: 2
Reputation: 515
If you want to use Regex.Split then @"(?<!\\)\\n"
matches \n
but not \\n
(\\\n
as well for that matter) and would not cut anything off. The negative look behind (?<!\\)
does not form part of the match so will not remove the extra character.
Upvotes: 5