keyeng
keyeng

Reputation: 31

split '\n' instead of '\\n' into string array

I have a csv file contains columns with value '\\\n' and '\\\t' which is escaped new line and tab. However, i want to split each row into string array.

how to split specifically '\n' but not '\\\n'?

I am looking at Regex.Split is it right direction? I tried Regex.Split(input, @"[^\\]\n"); but the result seems correct but one character in front is always missing, supposedly is caused by [^\].

Upvotes: 2

Views: 305

Answers (4)

Daniel
Daniel

Reputation: 612

Regex.Split(input, @"[^\\]\n");

The problem with the regex above is that square brackets match only one character, and what they match is considered part of the match itself, meaning the character directly preceding \n will be considered part of the split string and treated accordingly.

I think what you are looking for is a negative look-behind, which is used as follows:

(?<!DO NOT MATCH THIS)match

Look-behinds and look-aheads ensure that a match exists without including the matched text as part of your match.

I assume what you are looking for is something like this:

Regex.Split(input, @"(?<!\\)\n");

Hope that helps!

Upvotes: 1

JLRishe
JLRishe

Reputation: 101758

How about this:

(?<=^|^[^\\]|[^\\]{2})\\(n|t)

This will account for \ns and \ts that are at the beginning or second position of the input string

Upvotes: 0

w5l
w5l

Reputation: 5766

If you're parsing a CSV file, please try to use the TextFieldParser thats already in the framework. It will save you the headache of dealing with all the specific problems that come up when parsing a delimited file.


As mentioned below, it's part of the Microsoft.VisualBasic.dll, but this comes with the framework by default, you just need a reference. And even though it's called VisualBasic, it's in no way VB specific.

Upvotes: 2

CodeBeard
CodeBeard

Reputation: 515

If you want to use Regex.Split then @"(?<!\\)\\n" matches \n but not \\n (\\\n as well for that matter) and would not cut anything off. The negative look behind (?<!\\) does not form part of the match so will not remove the extra character.

Upvotes: 5

Related Questions