JimmyPop13
JimmyPop13

Reputation: 317

Find numbers following a certain string that appears many times

I've been looking around this for a while now and I only seemed to have succeeded in confusing myself so any help any one can give would be amazing.

Now I have a text file and it's fairly big, 100k lines plus.

And the text file goes something like the following:

The apple is set at Price: £1.00
Sale:  £3.50
Price: £2.00
Plum reduced to Sale:  £2.00
Bananas are usually Price: £4.00
Price: £3.00
Price: £2.00

And so on etc...

Now I want to extract all the numbers, just the numbers (no £) following the String "Price: £" and for the moment just print them out in the console.

Expected output should be:

1.00
2.00
4.00
3.00
2.00
There were 100,000 lines.

I have the following although I am sure it's a million miles off.

int counter = 0;
string line;
string input1 = " Price: £";
string price;

// Read the file and display it line by line.  
System.IO.StreamReader file =
    new System.IO.StreamReader(@"C:Pricelist.txt"); 
while ((line = file.ReadLine()) != null)
{
    price = Regex.Match(input1, @"\d+").Value;
    System.Console.WriteLine(price);
    //System.Console.WriteLine(line);
    counter++;
}

file.Close();
System.Console.WriteLine("There were {0} lines.", counter);
// Suspend the screen.  
System.Console.ReadLine();

My thinking is that the regex looks for the input1 string and then finds the next number but it doesn't seem to be working. Do I need to be getting it to read the string set in the line variable or is that a bad idea?

Again, I'm a little lost so any pointers will be great. If any further information is required please ask :)

Upvotes: 0

Views: 53

Answers (4)

Chris R. Timmons
Chris R. Timmons

Reputation: 2197

Your original code never uses the line variable. That's what has to be matched against - not input1.

Additionally, the regular expression can be defined once outside the loop and called repeatedly inside the loop. The static Regex methods create a new Regex instance each time they're called. That means calling the static Regex.Replace() method inside the loop 100,000 times creates 100,000 Regex instances.

int counter = 0;
string line;
string price;
var regex = new Regex("Price: £(?<amount>.*)");

// Read the file and display it line by line.  
using (System.IO.StreamReader file = new System.IO.StreamReader(@"c:Pricelist.txt"))
{
  while ((line = file.ReadLine()) != null)
  {
    var match = regex.Match(line);
    if (match.Success)
    {
      price = match.Groups["amount"].Value;
      System.Console.WriteLine(price);
    }
    //System.Console.WriteLine(line);
    counter++;
  }
}

System.Console.WriteLine("There were {0} lines.", counter);
// Suspend the screen.  
System.Console.ReadLine();

Upvotes: 0

Dennis Vanhout
Dennis Vanhout

Reputation: 103

considering you're saying

for the moment just print them out in the console.

I'd store the price variable in a var valueList = new List<string>(), that way you can just use valueList.ForEach(value=> Console.WriteLine(value)); allowing you to use the values at any later stage if wanted.

as for extracting the prices themself:

var prices = line.Split(' ');
var valueList = new List<string>();
prices.ToList().ForEach(p => {
   if (p.StartsWith("£"))
   valueList.Add(p.Substring(1));
   });

The Regex options suggested before are shorter, but some people prefer to not use Regex, so here's a regex-less solution.

Upvotes: 0

Poul Bak
Poul Bak

Reputation: 10929

The following regex should do what you want:

@"(?<=Price: £).*"

It uses a positive look behind for: 'Price: £', then it matches any char any number of times.

That produces the desired output.

How to use:

price = Regex.Match(input1, @"(?<=Price: £).*").Value;

Upvotes: 1

mrzasa
mrzasa

Reputation: 23317

Try the following regex: Price: £(\d+\.\d+), price will be in the first captured group.

Explanation:

  • Price: £ - literal with required prefix
  • (\d+\.\d+) - capturing group matching price with decimal part

Demo

Upvotes: 1

Related Questions