Miamian
Miamian

Reputation: 217

Separating text from numbers using regular expressions in VB.NET...I'm stumped

I'm having trouble parsing some text. Here's an example of the text:

   201 BBQ                 0.000    9.000    0.099    0.891    9.000    0.000    0.000    0.000

   705 W 1 PC              0.000  135.000    0.295   39.825    0.000    0.000  135.000    0.000

  2106 ONL 9.99           41.141    3.000    4.110   12.330    3.000    0.000    0.000   29.970

Here's the latest incarnation of the code I've been trying:

  objInfo = System.Text.RegularExpressions.Regex.Split(
    newLine,"(\d{3,5})|([0-9]+[.]+[0-9]+)|(\w*)")

I'm having trouble because I'm avoiding getting many blank spaces in the array after splitting. I'm trying to avoid using the optional | character but I get no results when I set it up without it!

I've spent much of the evening reviewing regular expressions and I've downloaded the following programs:

RegEx Designer.NET Antix RegEx Tester Expresso

I'm having trouble because the description contains a decimal point SOMETIMES and sometimes it doesn't. The description sometimes contains a whole number sometimes it doesn't.

My friend recommended I use awk to divide it into columns. The thing is...I teach a Community Education class with Visual Basic .Net and I need to improve my RegEx skills. Perhaps someone can give me some guidance so I can better help my students.

Upvotes: 1

Views: 1139

Answers (1)

Kobi
Kobi

Reputation: 138007

Since you know the number of columns, you can start by reading 8 decimals from the end of the line, and take the rest as the title. You can avoid a regex, but here's a simple solution with one:

Match match = Regex.Match(line, @"^(.*)((?:\d+\.\d+\s*){8})$");
string title = match.Groups[1].Value.Trim();
IEnumerable<decimal> numbers = match.Groups[2].Value
    .Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
    .Select(Convert.ToDecimal);

The regex captures two groups: ((?:\d+\.\d+\s*){8})$ is eight decimals at the end, and (.*) is the start of the sting until them. If you have extra decimals, as your third example, they will be added to the title.

Similarly, you may choose a non-regex solution (actually, this one is better if you don't mind losing a few spaces at the title):

string[] words = line.Split(" ".ToCharArray(),
                            StringSplitOptions.RemoveEmptyEntries);
int position = words.Count() - 8;
IEnumerable<decimal> numbers = words.Skip(position).Select(Convert.ToDecimal);
string title = String.Join(" ", words.Take(position));

Upvotes: 2

Related Questions