Reputation: 317
I've been looking around this for a while now and I only seemed to have succeeded in confusing myself so any help any one can give would be amazing.
Now I have a text file and it's fairly big, 100k lines plus.
And the text file goes something like the following:
The apple is set at Price: £1.00
Sale: £3.50
Price: £2.00
Plum reduced to Sale: £2.00
Bananas are usually Price: £4.00
Price: £3.00
Price: £2.00
And so on etc...
Now I want to extract all the numbers, just the numbers (no £) following the String "Price: £" and for the moment just print them out in the console.
Expected output should be:
1.00
2.00
4.00
3.00
2.00
There were 100,000 lines.
I have the following although I am sure it's a million miles off.
int counter = 0;
string line;
string input1 = " Price: £";
string price;
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader(@"C:Pricelist.txt");
while ((line = file.ReadLine()) != null)
{
price = Regex.Match(input1, @"\d+").Value;
System.Console.WriteLine(price);
//System.Console.WriteLine(line);
counter++;
}
file.Close();
System.Console.WriteLine("There were {0} lines.", counter);
// Suspend the screen.
System.Console.ReadLine();
My thinking is that the regex looks for the input1 string and then finds the next number but it doesn't seem to be working. Do I need to be getting it to read the string set in the line variable or is that a bad idea?
Again, I'm a little lost so any pointers will be great. If any further information is required please ask :)
Upvotes: 0
Views: 53
Reputation: 2197
Your original code never uses the line
variable. That's what has to be matched against - not input1
.
Additionally, the regular expression can be defined once outside the loop and called repeatedly inside the loop. The static Regex
methods create a new Regex
instance each time they're called. That means calling the static Regex.Replace()
method inside the loop 100,000 times creates 100,000 Regex
instances.
int counter = 0;
string line;
string price;
var regex = new Regex("Price: £(?<amount>.*)");
// Read the file and display it line by line.
using (System.IO.StreamReader file = new System.IO.StreamReader(@"c:Pricelist.txt"))
{
while ((line = file.ReadLine()) != null)
{
var match = regex.Match(line);
if (match.Success)
{
price = match.Groups["amount"].Value;
System.Console.WriteLine(price);
}
//System.Console.WriteLine(line);
counter++;
}
}
System.Console.WriteLine("There were {0} lines.", counter);
// Suspend the screen.
System.Console.ReadLine();
Upvotes: 0
Reputation: 103
considering you're saying
for the moment just print them out in the console.
I'd store the price variable in a var valueList = new List<string>()
, that way you can just use valueList.ForEach(value=> Console.WriteLine(value));
allowing you to use the values at any later stage if wanted.
as for extracting the prices themself:
var prices = line.Split(' ');
var valueList = new List<string>();
prices.ToList().ForEach(p => {
if (p.StartsWith("£"))
valueList.Add(p.Substring(1));
});
The Regex options suggested before are shorter, but some people prefer to not use Regex, so here's a regex-less solution.
Upvotes: 0
Reputation: 10929
The following regex should do what you want:
@"(?<=Price: £).*"
It uses a positive look behind
for: 'Price: £'
, then it matches any char any number of times
.
That produces the desired output.
How to use:
price = Regex.Match(input1, @"(?<=Price: £).*").Value;
Upvotes: 1