Ebikeneser
Ebikeneser

Reputation: 2364

Extracting text out of text file by parsing it using C#

I have a text file full of unstructured data.

Within that data I have telephone numbers that I want to extract and put into a new text file.

The numbers within the file are all I care about.

I was wondering if there was a method in C# or VB to do this?

I am aware that IBM have a software package called Omnifind to do data analytics however was wanting to write an application that just does the aforementioned topic.

P.S. An example of the data -

John Smith London 123456 
Hayley Smith Manchester 234567 
Mike Smith Birmingham 345678

So I want to create a new file that has just -

123456 
234567 
345678

Upvotes: 0

Views: 1914

Answers (3)

Benjol
Benjol

Reputation: 66551

Try this

using System.IO;
using System.Text.RegularExpressions;
public List<string> NaiveExtractor(string path)
{
    return 
    File.ReadAllLines(path)
        .Select(l => Regex.Replace(l, @"[^\d]", ""))
        .Where(s => s.Length > 0)
        .ToList();
}

As the name suggests, it's naive, and will pull out numbers in names too, and if a line has two phone numbers they'll get wodged together.

Upvotes: 1

Justin
Justin

Reputation: 86749

Well, you could use something like regular expressions or in this case you could probably just do with some basic string manipulation:

using (StreamReader reader = new StreamReader("infile.txt"))
{
    using (StreamWriter writer = new StreamWriter("outfile.txt"))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            int index = line.LastIndexOf(' ');
            if (index > 0 && index + 1 < line.Length)
            {
                writer.WriteLine(line.Substring(index + 1));
            }
        }
    }
}

Upvotes: 1

Unknown
Unknown

Reputation: 411

No luck - there's no such method. I'd suggest something like that -

List<string> result = new List<string>();
      using(StreamReader content = File.OpenText("text"))
      {
        while(!content.EndOfStream)
        {
          string line = content.ReadLine();
          var substrings = line.Split(' ');
          result.Add(substrings[substrings.Length-1]);
        }
      }

Upvotes: 3

Related Questions