Craig
Craig

Reputation: 1225

Convert linq result to name value pair

I have a query that returns multiply rows. The first row is the name and the second row is the actual value. The end result is to take a specific name and add the value to a datatable row.

Code:

    var query = from table in doc.DocumentNode.SelectNodes("//table[@border='0']")
                from row in table.SelectNodes("tr")
                from cell in row.SelectNodes("th|td")
                where (!string.IsNullOrEmpty(cell.InnerText.ToString()))
                && cell.InnerText.ToString() != "File Summary"
                && cell.InnerText.ToString() != "Payment Instructions"
                && cell.InnerText.ToString() != "Number"
                select cell.InnerText;

    foreach (var cell in query)
    {
        logger.Info("{0}", cell);
    }

Result

2020-03-18 15:29:04.5074 INFO Client Name:
2020-03-18 15:29:04.5764 INFO Siemens
2020-03-18 15:29:04.5764 INFO Client ID:
2020-03-18 15:29:04.5764 INFO 7000002
2020-03-18 15:29:04.5764 INFO Batch File Name:
2020-03-18 15:29:04.5764 INFO 6030001030-20200303-00005470
2020-03-18 15:29:04.5764 INFO File Status:
2020-03-18 15:29:04.5764 INFO Successful
2020-03-18 15:29:04.5764 INFO Sent
2020-03-18 15:29:04.5764 INFO 7
2020-03-18 15:29:04.5764 INFO Batch File ID:
2020-03-18 15:29:04.5764 INFO 0008615020
2020-03-18 15:29:04.5764 INFO Date Uploaded:
2020-03-18 15:29:04.5764 INFO 03-Mar-2020
2020-03-18 15:29:04.5764 INFO Successful
2020-03-18 15:29:04.5764 INFO 7
2020-03-18 15:29:04.5764 INFO Creator:
2020-03-18 15:29:04.5884 INFO STP-SIEMENSCORPOR
2020-03-18 15:29:04.5884 INFO Failed
2020-03-18 15:29:04.5884 INFO 0

Eventually

string clientname = value[x]; or something similar

Tried:

    var data = query.ToList();
    var obj = data.Select((item, index) =>
    {
        if (index < data.Count - 1 && index % 2 == 0)
            return new KeyValuePair<string, string>(item, data[index + 1]);
        return new KeyValuePair<string, string>(null, null);
    }).Where(x => x.Key != null);

But obj is null for the KeyValuePair

Upvotes: 1

Views: 585

Answers (4)

Lance U. Matthews
Lance U. Matthews

Reputation: 16606

Since all you ever need is consecutive strings - not the entire collection - to produce a pair, you can manually enumerate your query two at a time using an iterator method to yield results as soon as they are available...

static IEnumerable<KeyValuePair<string, string>> ExtractPairsByEnumeration(IEnumerable<string> items)
{
    using (var itemEnumerator = items.GetEnumerator())
    {
        while (itemEnumerator.MoveNext())
        {
            string name = itemEnumerator.Current;

            if (!itemEnumerator.MoveNext())
            {
                // We received a name item with no following value item
                // Use whatever default value you want here
                yield return new KeyValuePair<string, string>(name, "<none>");
                yield break;
            }
            else
                yield return new KeyValuePair<string, string>(name, itemEnumerator.Current);
        }
    }
}

You would call it like this...

using System;
using System.Collections.Generic;
using System.Linq;

namespace SO60748447
{
    class Program
    {
        private const string InputText = @"
Client Name:
Siemens
Client ID:
7000002
Batch File Name:
6030001030-20200303-00005470
File Status:
Successful
Sent
7
Batch File ID:
0008615020
Date Uploaded:
03-Mar-2020
Successful
7
Creator:
STP-SIEMENSCORPOR
Failed
0";

        static void Main()
        {
            string[] inputLines = InputText.Trim().Split("\r\n");
            IEnumerable<string>[] testQueries = new IEnumerable<string>[] {
                inputLines,
                inputLines.Take(inputLines.Length - 1)
            };

            foreach (IEnumerable<string> query in testQueries)
            {
                Console.WriteLine($"Extracting {query.Count()} input lines:");
                foreach (KeyValuePair<string, string> pair in ExtractPairsByEnumeration(query))
                    Console.WriteLine($"\t{pair}");
                Console.WriteLine();
            }
        }
    }
}

...which produces output like this...

Extracting 20 input lines:
    [Client Name:, Siemens]
    [Client ID:, 7000002]
    [Batch File Name:, 6030001030-20200303-00005470]
    [File Status:, Successful]
    [Sent, 7]
    [Batch File ID:, 0008615020]
    [Date Uploaded:, 03-Mar-2020]
    [Successful, 7]
    [Creator:, STP-SIEMENSCORPOR]
    [Failed, 0]

Extracting 19 input lines:
    [Client Name:, Siemens]
    [Client ID:, 7000002]
    [Batch File Name:, 6030001030-20200303-00005470]
    [File Status:, Successful]
    [Sent, 7]
    [Batch File ID:, 0008615020]
    [Date Uploaded:, 03-Mar-2020]
    [Successful, 7]
    [Creator:, STP-SIEMENSCORPOR]
    [Failed, <none>]

Notice that it still produces reasonable output in the second case when given an odd number of lines.

A LINQ alternative that also requires only an IEnumerable<> is to use the Aggregate() method. This takes a delegate to which is passed each item in the input sequence and an accumulator (i.e. list) that you use to store the results to that point...

static List<KeyValuePair<string, string>> ExtractPairsByAggregation(IEnumerable<string> items)
{
    return items
        // Aggregate() doesn't have an overload that provides the index,
        // so package it together with each item in a ValueTuple using Select()
        .Select((item, index) => (item, index))
        .Aggregate(
            // Storage for both intermediate and final results
            new List<KeyValuePair<string, string>>(),
            (list, current) => {
                if (current.index % 2 == 0) // Even items are a name
                {
                    KeyValuePair<string, string> newPair = new KeyValuePair<string, string>(
                        current.item, "<none>"
                    );

                    // Add a partial pair as soon as it's encountered
                    // so it's still present in the results even if
                    // this is the last item in the sequence
                    list.Add(newPair);
                }
                else                        //  Odd items are a value
                {
                    // The last pair in the list is the corresponding partial pair
                    int pairIndex = list.Count - 1;
                    KeyValuePair<string, string> oldPair = list[pairIndex];
                    // KeyValuePair<> is immutable, so recreate it now that we have the value
                    KeyValuePair<string, string> newPair = new KeyValuePair<string, string>(
                        oldPair.Key, current.item
                    );

                    list[pairIndex] = newPair;
                }

                // Return the same list so it is available for the
                // next item in the sequence and as the final result
                return list;
            }
        );
}

Unlike ExtractPairsByEnumeration(), this returns the full results when they're all available, instead of one at a time. If you call this in the Main() method above the output is the same.

By the way, assuming these are classes from the HTML Agility Pack you're using then calling cell.InnerText.ToString() is unnecessary because InnerText is, as the name implies, already a string, but if you insist on calling it you should use a let clause so you can call it once and reuse the result...

[snip]
from cell in row.SelectNodes("th|td")
let cellText = cell.InnerText.ToString()
where !string.IsNullOrEmpty(cellText)
    && cellText != "File Summary"
    && cellText != "Payment Instructions"
    && cellText != "Number"
    select cell.InnerText;// This should probably be cellText as well

Upvotes: 0

jdweng
jdweng

Reputation: 34429

I've been parsing text files for over 45 years. The code below works when each item contains 2 or more rows. Not like the other solution. Using mod 2 will not work in this case (or divided by 2)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.txt";
        static void Main(string[] args)
        {
            StreamReader reader = new StreamReader(FILENAME);
            string line = "";
            Dictionary<string, string> dict = new Dictionary<string, string>();
            string buffer = "";
            string key = "";
            Boolean first = true;
            while ((line = reader.ReadLine()) != null)
            {
                line = line.Trim();
                if (line.Length > 0)
                {
                    string[] splitLine = line.Split(new string[] { "INFO" }, StringSplitOptions.None).ToArray();
                    if (splitLine[1].Contains(":"))
                    {
                        if (!first)
                        {
                            dict.Add(key, buffer);

                        }
                        key = splitLine[1].Trim(new char[] { ' ', ':' });
                        buffer = "";
                        first = false;
                    }
                    else
                    {
                        if (buffer == string.Empty)
                        {
                            buffer = splitLine[1].Trim();
                        }
                        else
                        {
                            buffer += "," + splitLine[1].Trim();
                        }
                    }
                }
            }
            dict.Add(key, buffer);
            foreach (KeyValuePair<string, string> pair in dict)
            {
                Console.WriteLine("Key '{0}' : Value '{1}'", pair.Key, pair.Value);
            }
            Console.ReadLine();
        }
    }
}

enter image description here

Upvotes: 0

Prateek Shrivastava
Prateek Shrivastava

Reputation: 1937

You can try something like this:

List<string> data = new List<string>() { "Client Name", "Siemens", "Client ID", "7000002", "File Status", "Successful" };

var obj = data.Select((item, index) =>
 {
     if (index < data.Count - 1 && index % 2 == 0)
         return new KeyValuePair<string, string>(item, data[index + 1]);
     return new KeyValuePair<string, string>(null, null);
 }).Where(x => x.Key != null);

In place of data in above code, you can use your variable: query

This is overtly confusing, so a simpler way is:

Dictionary<string, string> map = new Dictionary<string, string>();
for (int idx = 0; idx < data.Count - 1; idx += 2)
{
    map[data[idx]] = data[idx + 1];
}

Upvotes: 3

John Wu
John Wu

Reputation: 52260

You need random access, so your first step should be to realize the results and store them in memory.

var list = query.ToList();

Once you have that you can access via index and assemble the rows you need.

var dictionary = Enumerable.Range(0, list.Count / 2)
    .ToDictionary
    (
        i => list[i * 2],
        i => list[i * 2 + 1]
    );

Upvotes: 1

Related Questions