Reputation: 807
Quick overview: The main goal is to read in the data from a set date, from the row and get the ref number form the set date e.g. Start date.
For example if i just wanted the data from the date set to the 1st of last month and upwards.
I currently have to extract some data from the excel spreadsheet example below:
Start date Ref number
29/07/2015 2342326
01/07/2016 5697455
02/08/2016 3453787
02/08/2016 5345355
02/08/2015 8364456
03/08/2016 1479789
04/07/2015 9334578
Output using EPPlus:
29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578
This part is fine, but when i try to strip the output via a date range i'm getting errors, e.g. using LINQ i get the following error output.
An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll
Additional information: Specified cast is not valid.
LINQ code:
var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();
I've also tried to modify the from date range using the datatable:
DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
But get the following error:
An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll
Additional information: Cannot perform '>=' operation on System.String and System.Double.
The last attempt was to try and see if i can strip the date from within the loop.
Code used:
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
This sort of works but just lists the set dates and not there values, which is understandable, If i take this route how would i get the dates with there values?
Output:
29/07/2016
02/08/2016
02/08/2016
03/08/2016
The full code example used:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;
namespace Number_Cleaner
{
public class NumbersReport
{
//ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
System.Data.DataTable tbl = new System.Data.DataTable();
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
//Console.WriteLine(dt);
if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}
DateTime date1 = new DateTime(2016, 7, 1);
if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
//Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
}
//var rowsOfInterest = tbl.AsEnumerable()
// .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
//.ToList();
//Console.WriteLine(tbl);
//DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");
return tbl;
}
}
Modified from: How to match date to row then get the final column value using EPPlus?
Upvotes: 0
Views: 1386
Reputation: 14250
Based on your code, you are storing everything in your DataTable
as strings
by calling cell.Text
. But using that you are loosing valuable information - the cell data type. You are much better off using cell.Value
which will either be a string
or a double
. With Excel, dates, integers, and decimal values are all stored as doubles
.
The error you are seeing has to do with the fact that you store the values as string but query them like DateTime
here:
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
and here:
"'Start date' >= #1/7/2016#"
If you look at my post here: How to parse excel rows back to types using EPPlus you will see the helper function ConvertSheetToObjects
which deals pretty much with what you are trying to do. With a little modification we can turn it into something that takes a WorkSheet
and converts it to a DataTable
. Like the Object converstion method you should still provide it the expected structure in the form as a DataTable
passed it rather then having it try to guess it by casting cell values:
public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
//DateTime Conversion
var convertDateTime = new Func<double, DateTime>(excelDate =>
{
if (excelDate < 1)
throw new ArgumentException("Excel dates cannot be smaller than 0.");
var dateOfReference = new DateTime(1900, 1, 1);
if (excelDate > 60d)
excelDate = excelDate - 2;
else
excelDate = excelDate - 1;
return dateOfReference.AddDays(excelDate);
});
//Get the names in the destination TABLE
var tblcolnames = dataTable
.Columns
.Cast<DataColumn>()
.Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
.ToList();
//Cells only contains references to cells with actual data
var cellGroups = worksheet.Cells
.GroupBy(cell => cell.Start.Row)
.ToList();
//Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
var colnames = cellGroups
.First()
.Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
.Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
.ToList();
//Add the rows - skip the first cell row
for (var i = 1; i < cellGroups.Count(); i++)
{
var cellrow = cellGroups[i].ToList();
var tblrow = dataTable.NewRow();
dataTable.Rows.Add(tblrow);
colnames.ForEach(colname =>
{
//Excel stores either strings or doubles
var cell = cellrow[colname.index];
var val = cell.Value;
var celltype = val.GetType();
var coltype = tblcolnames.First(tcol => tcol.Name == colname.Name).Type;
//If it is numeric it is a double since that is how excel stores all numbers
if (celltype == typeof(double))
{
//Unbox it
var unboxedVal = (double)val;
//FAR FROM A COMPLETE LIST!!!
if (coltype == typeof (int))
tblrow[colname.Name] = (int) unboxedVal;
else if (coltype == typeof (double))
tblrow[colname.Name] = unboxedVal;
else
throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
}
else if (coltype == typeof (DateTime))
{
//Its a date time
tblrow[colname.Name] = val;
}
else if (coltype == typeof (string))
{
//Its a string
tblrow[colname.Name] = val;
}
else
{
throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
}
});
}
}
To use it on something like this:
You would run this:
[TestMethod]
public void Sheet_To_Table_Test()
{
//https://stackoverflow.com/questions/38915006/stripping-data-from-a-epplus-output-from-a-date-range
//Create a test file
var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");
using (var package = new ExcelPackage(fi))
{
var workbook = package.Workbook;
var worksheet = workbook.Worksheets.First();
var datatable = new DataTable();
datatable.Columns.Add("Col1", typeof(int));
datatable.Columns.Add("Col2", typeof(string));
datatable.Columns.Add("Col3", typeof(double));
datatable.Columns.Add("Col4", typeof(DateTime));
worksheet.ConvertSheetToDataTable(ref datatable);
foreach (DataRow row in datatable.Rows)
Console.WriteLine(
$"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
$", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
$", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
$", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");
//To Answer OP's questions
datatable
.Select("Col4 >= #01/03/2016#")
.Select(row => row["Col1"])
.ToList()
.ForEach(num => Console.WriteLine($"{{{num}}}"));
}
}
Which gives this in the output:
row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}
Upvotes: 1