Reputation: 107
I am trying to read a text file which has delimiters of space and as well as double quotes and it is there is not a easy way to identify this scenario, I just wanted to check if this can be achieved using predefined Regular expression otherwise I need to start working on custom split
Here is the string
"myfile-one two" "1" 3 1453454.00 -134557.63 585.0 24444.8 -999 "NULL" "" 45.60 "" 67°32'5.23455"N 54°56'65.3454"W "NULL" 6.00
The output should be
myfile-one two
1
3
1453454.00
-134557.63
585.0
24444.8
-999
NULL
45.60
67°32'5.23455"N
54°56'65.3454"W
NULL
6.00
below code try to first split into space delimiter and this split even within the double quotes as well and made as separate entry
char[] space = new Char[] { ' ' };
string[] data = comp.Split(space, StringSplitOptions.RemoveEmptyEntries);
Upvotes: 3
Views: 513
Reputation: 28968
Since regex is impacting performance heavily and the described scenario is quite simple, I would like to offer a short, fast and regex free solution, that makes use of string
members only. In addition, the regex free approach is by far more readable and more robust.
// The escaped input string
var input = @"""myfile-one two"" ""1"" 3 1453454.00 -134557.63 585.0 24444.8 -999 ""NULL"" """" 45.60 """" 67°32'5.23455""N 54°56'65.3454""W ""NULL"" 6.00 ";
List<string> cleanedInputTokens = input
.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(token => token.Trim('"'))
.ToList();
The algorithm first splits the input into tokens and then trims leading and trailing specified characters. Because Split(Char[], StringSplitOptions)
and Trim(Char[])
both accept an array of characters, this pattern is also extensible and flexible.
Upvotes: 0
Reputation: 626804
You may match any substrings between double quotes that are not enclosed with whitespaces and capture what is inside them into a named group, or match any 1+ non-whitespace chars and capture into the indentically named group and use
var results = Regex.Matches(str, @"(?<!\S)""(?<o>.*?)""(?!\S)|(?<o>\S+)")
.Cast<Match>()
.Select(m => m.Groups["o"].Value)
.ToList();
See the regex demo.
Pattern details
(?<!\S)
- a whitespace or start of string is required immediately to the left of the current location"
- a double quotation mark(?<o>.*?)
- Group "o": any 0+ chars other than newline, as few as possible"
- a double quotation mark(?!\S)
- a whitespace or end of string is required immediately to the right of the current location|
- or(?<o>\S+)
- Group "o": any 1+ non-whitespace chars..NET allows the use of the identically named groups inside one regex pattern accumulating the values found into the corresponding memory buffer that you may "collect" via .Select(m => m.Groups["o"].Value)
.
Upvotes: 4