Rocky
Rocky

Reputation: 4524

How to import Excel which is in HTML format

I have exported the data from database using HttpContext with formatting of table, tr and td. I want to read the same file and convert into datatable.

<add name="Excel03ConString" connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='HTML Import;HDR={1};IMEX=1'" />

<add name="Excel03ConString" connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1};IMEX=1'" />

    private DataTable GetTableFromExcel()
    {
        DataTable dt = new DataTable();

        try
        {
            if (exclFileUpload.HasFile)
            {
                string FileName = Path.GetFileName(exclFileUpload.PostedFile.FileName);
                string Extension = Path.GetExtension(exclFileUpload.PostedFile.FileName);
                string FolderPath = Server.MapPath(ConfigurationManager.AppSettings["FolderPath"]);
                //string NewFileName = string.Format("{0}_{1}", DateTime.Now.ToString().Replace("/", "").Replace(" ", "").Replace(":", ""), FileName);
                string FilePath = Path.Combine(string.Format("{0}/{1}", FolderPath, FileName));
                exclFileUpload.SaveAs(FilePath);
                string conStr = "";
                switch (Extension)
                {
                    case ".xls": //Excel 97-03
                        conStr = ConfigurationManager.ConnectionStrings["Excel03ConString"].ConnectionString;
                        break;
                    case ".xlsx": //Excel 07
                        conStr = ConfigurationManager.ConnectionStrings["Excel07ConString"].ConnectionString;
                        break;
                }
                conStr = String.Format(conStr, FilePath, true);
                OleDbConnection connExcel = new OleDbConnection(conStr);
                OleDbCommand cmdExcel = new OleDbCommand();
                OleDbDataAdapter oda = new OleDbDataAdapter();

                cmdExcel.Connection = connExcel;

                connExcel.Open();
                DataTable dtExcelSchema;
                dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
                string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
                connExcel.Close();

                connExcel.Open();
                cmdExcel.CommandText = "SELECT * From [" + SheetName + "]";
                oda.SelectCommand = cmdExcel;
                oda.Fill(dt);
                connExcel.Close();
                File.Delete(FilePath);

            }
        }
        catch (Exception ex)
        {

        }
        return dt;
    }

When using the second connection string I am getting error "External table is not in the expected format on connection.Open()." But when using the first, I am getting error on reading the sheet name.

Please tell me how to read the sheet or, directly, data from Excel.

Upvotes: 11

Views: 4363

Answers (4)

sudhansu63
sudhansu63

Reputation: 6180

I think This Third party dll-(ExcellDataReader) may help solve your problem.

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);

//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();

//5. Data Reader methods
while (excelReader.Read())
{
    //excelReader.GetInt32(0);
}

//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();

Upvotes: 3

Ullas
Ullas

Reputation: 11556

After a deep research I found the solution.

First convert the particular Excel file to an html page by using the following code.

File.Move(Server.MapPath("~/Foldername/ExcelName.xls",Path.ChangeExtension(Server.MapPath("~/Foldername/ExcelName.xls"), ".html"));

we have to download HTML string and extract content. tag contains and tags but it may have style properties. So first we have to avoid these style properties and after we can get required content from table.

string url = Server.MapPath("~/FolderName/Excelname.html");
WebClient wc = new WebClient();
string fileContent = wc.DownloadString(url);

Here we have to format HTML tags for avoid style properties.

const string msgFormat = "table[{0}], tr[{1}], td[{2}], a: {3}, b: {4}";   
const string table_pattern = "<table.*?>(.*?)</table>"; 
const string tr_pattern = "<tr.*?>(.*?)</tr>"; 
const string td_pattern = "<td.*?>(.*?)</td>"; 
const string a_pattern = "<a href=\"(.*?)\"></a>"; 
const string b_pattern = "<b>(.*?)</b>";

After through looping we can find <tr> and <td> elements. Then we can get content within <td></td> tags by using this method.

private static List<string> GetContents(string input, string pattern)
{ 
    MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Singleline); 
    List<string> contents = new List<string>(); 
    foreach (Match match in matches) 
    contents.Add(match.Value); 
    return contents; 
}

Then we can insert imported records to database by each row.

Reference link here

Upvotes: 0

Mairaj Ahmad
Mairaj Ahmad

Reputation: 14604

You may be facing this issue due to different reasons. There are different solutions for this one of them is to make your solution debug as x86. Here is how you can change it to x86.

  • Right click on sloution from Visual studio.
  • Click configuration Manager
  • Select x86 from Active solution platform if available
  • If not available click New and select or type x86 and click ok.
  • Rebuild solution and run you application.

If this deos not solve your problem you may have to install 32 bit version of office system drivers. Here is a complete article explaining the problem.

Upvotes: 0

Steven Lemmens
Steven Lemmens

Reputation: 630

I found this online: C# Excel file OLEDB read HTML IMPORT

Here they say:

Instead of using the sheetname, you must use the page title in the select statement without the $. SELECT * FROM [HTMLPageTitle]

In that post they also link to this manual, which may come in handy, but is too long to copy here: http://ewbi.blogs.com/develops/2006/12/reading_html_ta.html

If that doesn't work, I t think you will have to recreate the original excel so it remains an excel file, and not HTML (if that is at all possible in your scenario)

Upvotes: 0

Related Questions