Marcus Aurelio
Marcus Aurelio

Reputation: 153

How do I extract everything inside the double-quotes from a list?

I have the code:

class Program {
   static void Main(string[] args) {
   const string f = "../../../input.txt";
   List < string > lines = new List < string > ();
   using(StreamReader r = new StreamReader(f)) {
            string line;
            while ((line = r.ReadLine()) != null) {
                if (line.StartsWith("    <job_number") && line.EndsWith(">")) {
                    lines.Add(line);
                }
            }
        }
        foreach(string s in lines) {
            Console.WriteLine(s);
        }
        Console.Read();
    }
}

after the while loop, I run a condition to find any lines that start with some string and end with some string. This is how the string looks like:

<job_number "1234" />
<job_number "1829" />

How do I extract the numbers from inside the quote? At the moment the console prints out the whole line:

<job_number "1234" />
<job_number "1829" />

I want:

1234
1829

I've looked into Regex but it confuses me greatly.

Edit: I need to add that the file I am parsing is a systems configuration file which contains a lot of other data. I have managed to create a list called lines that gets the exact values that I need. I now need to throw some formatting in this list to get the values from the list (everything inside the quotes).

Upvotes: 2

Views: 213

Answers (5)

Orel Eraki
Orel Eraki

Reputation: 12196

Using IndexOf and Substring will get the job done in a manner of speed and memory and simplicity(part).

if (line.StartsWith("    <job_number") && line.EndsWith(">")) {
    int start = line.IndexOf("\"") + 1;
    int end = line.IndexOf("\"", start);

    if (start > 0 && end > 0)
    {
        string numberAsString = line.Substring(start, end - start);
        int number;
        if (int.TryParse(numberAsString, out number))
        {
            lines.Add(number);
            //Console.WriteLine(number);
        }
    }
}

Upvotes: 0

oopbase
oopbase

Reputation: 11395

In your case, a simple regex match with \d+ will do the job.

//...
while ((line = r.ReadLine()) != null)
{
    var re = Regex.Match(line, @"(\d+)");
    if (re.Success) 
    {
        var val = re.Groups[1].Value; 
        lines.Add(val);
    }
}
//...

EDIT:

You can of course change the regex for your exact needs, for example:

var re = Regex.match(line, "job_number\\s\"(\\d+)\"");

might be more appropriate if your file contains other numbers as well.

Upvotes: 3

Fabjan
Fabjan

Reputation: 13676

If format of your string is invariable you can do it in one line with a simple Split method :

   string value = input.Split('"')[1];

For example :

   string[] s =
   {
       @"<job_number ""1234"" />",
       @"<job_number ""1829"" />"
   };
   for (int i = 0; i < s.Length; i++) Console.Write(s[i].Split('"')[1] +",  ");

Output : 1234, 1829

Upvotes: 0

Christopher Painter
Christopher Painter

Reputation: 55581

The file you are parsing is practically XML Why not just go all the way and standardize the format to be xml complaint?

<Jobs>
 <Job Number="1234" />
 <Job Number="1235" />
</Jobs>

Then you could simply use Linq to XML to grab all the Job elements and enumerate their Number attribute.

    XDocument doc = XDocument.Load("XMLFile1.xml");
    var numbers = from t in doc.Descendants("Job")
                  select t.Attribute("Number").Value;
    foreach (var number in numbers)
    {
        Console.WriteLine(number);
    }

Upvotes: 0

w.b
w.b

Reputation: 11228

If you are keen on LINQ:

var str = @"<job_number ""1234"" />";
var num = new string(str.Where(c => Char.IsDigit(c)).ToArray());

Console.WriteLine(num); // 1234

Upvotes: 3

Related Questions