Reputation: 14568
I have an output like -
Col.A Col.B Col.C Col.D
--------------------------------------------------------------
* 1 S60-01-GE-44T-AC SGFM115001195 7520051202 A
1 S60-PWR-AC APFM115101302 7520047802 A
1 S60-PWR-AC APFM115101245 7520047802 A
or
Col.A Col.B Col.C Col.D
--------------------------------------------------------------
* 0 S50-01-GE-48T-AC DL252040175 7590005605 B
0 S50-PWR-AC N/A N/A N/A
0 S50-FAN N/A N/A N/A
For these outputs the regular expression -
(?:\*)?\s+(?<unitno>\d+)\s+\S+-\d+-(?:GE|TE)?-?(?:\d+(?:F|T))-?(?:(?:AC)|V)?\s+(?<serial>\S+)\s+\S+\s+\S+\s+\n
works fine to capture Column A and Column B. But recently I got a new kind of output -
Col.A Col.B Col.C Col.D
---------------------------------------------------------
* 0 S4810-01-64F HADL120620060 7590009602 A
0 S4810-PWR-AC H6DL120620060 7590008502 A
0 S4810-FAN N/A N/A N/A
0 S4810-FAN N/A N/A N/A
As you can see the patterns "GE|TE" and the "AC|V" are missing from these outputs. How do I change my regular expression accordingly maintaining backward compatibility.
EDIT:
The output that you see comes in a complete string and due to some operational limits I cannot use any other concept other than regex here to get my desired values. I know using split would be ideal here but I cannot.
Upvotes: 0
Views: 497
Reputation: 112324
A regular expression seems not to be the right approach here. Use a positional approach
string s = "* 0 S4810-01-64F HADL120620060 7590009602 A";
bool withStar = s[0] == '*';
string nr = s.Substring(2, 2).Trim();
string colA = s.Substring(5, 18).TrimEnd();
string colB = s.Substring(24, 14).TrimEnd();
...
UPDATE
I you want (or must) stick to Regex, test for the spaces instead of the values. Of cause this works only if the values never include spaces.
string[] result = Regex.Split(s, "\s+");
Of cause you can also search for non-spaces \S
instead of \s
.
MatchCollection matches = Regex.Matches(s, "\S+");
or excluding the star
(?:\*)?[^*\s]+
Upvotes: 2
Reputation: 31394
You are probably better off using String.Split() to break the column values out into sperate strings and then processing them, rather that using a huge un-readable regular expression.
foreach (string line in lines) {
string[] colunnValues = line.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
...
}
Upvotes: 2
Reputation: 2215
Why not try something like this (?:\*)?\s+(?<unitno>\d+)\s+\S+\s+(?<serial>\S+)\s+\S+\s+\S+(?:\s+)?\n
This is built off your provided regular expression and due to the trailing \n
the provided input will need to end with a carriage return.
Upvotes: 1
Reputation: 1708
I would not use regular expressions to parse these reports.
Instead, treat them as fixed column width reports after the headers are stripped off.
I would do something like (this is typed cold as an example, not tested even for syntax):
// Leaving off all public/private/error detection stuff
class ColumnDef
{
string Name { set; get; }
int FirstCol { set; get; }
int LastCol { set; get; }
}
ColumnDef[] report = new ColumnDef[]
{
{ Name = "ColA",
FirstCol = 0,
LastCol = 2
},
/// ... and so on for each column
}
IDictionary<string, string> ParseDataLine(string line)
{
var dummy = new Dictionary<string, string>();
foreach (var c in report)
{
dummy[c.Name] = line.Substring(c.FirstCol, c.LastCol).Trim();
}
}
This is an example of a generic ETL (Extract, Transform, and Load) problem--specifically the Extract stage.
You will have to strip out header and footer lines before using ParseDataLine
, and I am not sure there is enough information shown to do that. Based on what your post says, any line that is blank, or doesn't start with a space or a *
is a header/footer line to be ignored.
Upvotes: 1
Reputation: 31184
your regular expression doesn't even need GE
or TE
. See that ?
after (?:GE|TE)
?
that means that the previous group or symbol is optional.
the same is true with the AC
and V
section
Upvotes: 1