Reputation: 8818
All the following must be done in C#. Parsing the SQL table (SQL Server) will be done using methods in System.Data.Odbc
.
Let's assume I have two .csv files, fi1
and fi2
. The first csv file has two columns id
and val1
, and the second csv has two columns as well, id
and val2
.
I would like to read the two files, and parse the output to one SQL table with the following columns: id, val1, val2
.
The problem is that the two files may have different entries in the id
columns: in other words, some id's may have a val1
value but no val2
value, and vice versa, or they might have both values.
The table should contain the union of the id
columns in the two files.
Example:
File 1
File2
The way I would want the final SQL table to look like is this:
Note that each file might contain duplicates, and we would want to exclude the duplicates when parsing the SQL table.
The thought I had is to create two dictionaries, dict1
and dict2
, where the key would be the id
, and the value would be val1
and val2
. Dictionaries will be used to make sure that duplicates are not included:
Dictionary<string, string> dict1 = new Dictionary<string, string>();
string[] header1 = new string[]{};
using (StreamReader rdr = new StreamReader(fi1))
{
header1 = rdr.ReadLine().Split(',');
while (!rdr.EndOfStream)
{
string ln = rdr.ReadLine();
string[] split_ln = ln.Split(',');
dict1.Add(split_ln[0], split_ln[1]);
}
}
Dictionary<string, string> dict2 = new Dictionary<string, string>();
string[] header2 = new string[]{};
using (StreamReader rdr = new StreamReader(fi2))
{
header2 = rdr.ReadLine().Split(',');
while (!rdr.EndOfStream)
{
string ln = rdr.ReadLine();
string[] split_ln = ln.Split(',');
dict2.Add(split_ln[0], split_ln[1]);
}
}
However, after adding each file to a dictionary, I am not sure how to match the id's of both dictionaries.
Would anyone have a good hint as to how to deal with this problem?
Upvotes: 1
Views: 251
Reputation: 737
I would do atually do a list of tuples to hold the values here instead of a dictionary so that all the information is in one place rather than matching keys, each tuple corresponds to a table record
var dict = new List<Tuple<string, string, string>>();
using (StreamReader rdr = new StreamReader(fi1))
{
while (!rdr.EndOfStream)
{
string ln = rdr.ReadLine();
string[] split_ln = ln.Split(',');
dict.Add(new Tuple<string, string, string>(split_ln[0], split_ln[1],null));
}
}
using (StreamReader rdr = new StreamReader(fi2))
{
while (!rdr.EndOfStream)
{
string ln = rdr.ReadLine();
string[] split_ln = ln.Split(',');
if (dict.Any(item => item.Item1 == split_ln[0]))
{
var item = dict.Find(i => i.Item1 == split_ln[0]);
var newtuple = new Tuple<string, string, string>(item.Item1, item.Item2, split_ln[1]);
dict.Remove(item);
dict.Add(newtuple);
}
else
{
dict.Add(new Tuple<string, string, string>(split_ln[0],null,split_ln[1]));
}
}
}
Upvotes: 1