Reputation: 5
I got a big datatable (over 300K rows with 40 columns in it) with data, the fragments like these (all value is string) :
colA colB colC ColD ColdE ColF ColG ColH
--------------------------------------------------------
A01 B01 C01 DA1 EA1 FA1 GA1 HA1
A01 B01 C01 DA2 EA2 FA2 GA2 HA2
A02 B02 C02 DA3 EA3 FA3 GA3 HA3
A02 B02 C02 DA4 EA4 FA4 GA4 HA4
A03 B03 C03 DA5 EA5 FA5 GA5 HA5
A04 B04 C04 DA6 EA6 FA6 GA6 HA6
There is partial data duplicated and I would like merge duplicated data by using colA + ColB+ ColC for key and keep ColD ColE ColF, use first row for other columns. The expected result like these:
colA colB colC ColD1 colE1 colF1 colG1 ColD2 colE2 colF2 colG2 ColH
------------------------------------------------
A01 B01 C01 DA1 EA1 FA1 GA1 DA2 EA2 FA2 GA2 HA1
A02 B02 C02 DA3 EA3 FA3 GA3 DA4 EA4 FA4 GA4 HA3
A03 B03 C03 DA5 EA5 FA5 GA5 null null null null HA5
A04 B04 C04 DA6 EA6 FA6 GA6 null null null null HA6
It is something like pivot but have some difference, I tried to use T-SQL or LINQ with C# but have no idea to do this, please someone help, many thanks.
Upvotes: 0
Views: 146
Reputation: 3089
sounds like a job for the ExpandoObject
relying on the input records you've provided
var input = new DataTable();
input.Columns.Add("ColA");
input.Columns.Add("ColB");
input.Columns.Add("ColC");
input.Columns.Add("ColD");
input.Rows.Add("A01", "B01", "CA1", "DA1");
input.Rows.Add("A01", "B01", "CA2", "DA2");
input.Rows.Add("A02", "B02", "CA3", "DA3");
input.Rows.Add("A02", "B02", "CA4", "DA4");
input.Rows.Add("A03", "B03", "CA5", "DA5");
input.Rows.Add("A04", "B04", "CA6", "DA6");
you could convert a record into a dynamic expandable object
public IDictionary<string, Object> Map(DataRow row)
{
var columns = row.Table.Columns;
var result = new ExpandoObject() as IDictionary<string, Object>;
for (var index = 0; index < row.ItemArray.Count(); index++)
{
result.Add($"{columns[index]}", row[index]);
}
return result;
}
and then a bit of logic meant to group input by marker element and expand where needed
var seed = new List<IDictionary<string, Object>>();
var output = input
.AsEnumerable()
.Select(Map)
.Aggregate(seed, (results, current)=>
{
// Check if the current values match any of the first element in the results
var query = from result in results
let marker = result
.Select(p => p.Value)
.FirstOrDefault()
where current.Values.Contains(marker)
select result;
var found = query.SingleOrDefault();
if (found == null)
{
// None were found then simply append the current values
results.Add(current);
}
else
{
// Some were found then isolate the new ones
var others = from value in current.Values
where !found.Values.Contains(value)
select value;
// Append the new ones to the found result
foreach (var value in others)
{
var index = found.Values.Count;
found.Add($"Col{index}".ToString(), value);
}
}
return results;
});
and the end result will look like this
check gist for the whole picture
Upvotes: 1
Reputation: 1732
Note that this is not a generic solution but will work in the given example.
List<string[]> input = new List<string[]>()
{
new string[] {"A01","B01","CA1","DA1"},
new string[] {"A01","B01","CA2","DA2"},
new string[] {"A02","B02","CA3","DA3"},
new string[] {"A02","B02","CA4","DA4"},
new string[] {"A03","B03","CA5","DA5"},
new string[] {"A04","B04","CA6","DA6"},
};
var grouped = input.GroupBy(x => new { key1 = x[0], key2 = x[1] }, (keys, group) => new
{
Key1 = keys.key1,
Key2 = keys.key2,
// skip(2) to prevent the keys to be added in the list
Result = group.SelectMany(x => x.Skip(2)).ToList()
});
Output:
{ Key1 = "A01", Key2 = "B01", Result = ["CA1", "DA1, "CA2", "DA2"] }
{ Key1 => "A02", Key2 = "B02", Result = ["CA3", "DA3, "CA4", "DA4"] }
{ Key1 = "A03", Key2 = "B03", Result = ["CA5", "DA5"] }
{ Key1 = "A04", Key2 = "B04", Result = ["CA6", "DA6"] }
Upvotes: 0