Reputation: 2364
I have a text file full of unstructured data.
Within that data I have telephone numbers that I want to extract and put into a new text file.
The numbers within the file are all I care about.
I was wondering if there was a method in C# or VB to do this?
I am aware that IBM have a software package called Omnifind to do data analytics however was wanting to write an application that just does the aforementioned topic.
P.S. An example of the data -
John Smith London 123456
Hayley Smith Manchester 234567
Mike Smith Birmingham 345678
So I want to create a new file that has just -
123456
234567
345678
Upvotes: 0
Views: 1914
Reputation: 66551
Try this
using System.IO;
using System.Text.RegularExpressions;
public List<string> NaiveExtractor(string path)
{
return
File.ReadAllLines(path)
.Select(l => Regex.Replace(l, @"[^\d]", ""))
.Where(s => s.Length > 0)
.ToList();
}
As the name suggests, it's naive, and will pull out numbers in names too, and if a line has two phone numbers they'll get wodged together.
Upvotes: 1
Reputation: 86749
Well, you could use something like regular expressions or in this case you could probably just do with some basic string manipulation:
using (StreamReader reader = new StreamReader("infile.txt"))
{
using (StreamWriter writer = new StreamWriter("outfile.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
int index = line.LastIndexOf(' ');
if (index > 0 && index + 1 < line.Length)
{
writer.WriteLine(line.Substring(index + 1));
}
}
}
}
Upvotes: 1
Reputation: 411
No luck - there's no such method. I'd suggest something like that -
List<string> result = new List<string>();
using(StreamReader content = File.OpenText("text"))
{
while(!content.EndOfStream)
{
string line = content.ReadLine();
var substrings = line.Split(' ');
result.Add(substrings[substrings.Length-1]);
}
}
Upvotes: 3