Anna Lippel
Anna Lippel

Reputation: 11

Read an excel table in an outlook mail in vba

I'm trying to read a table in my mail body and save it in a csv file (I attached an html sample below representing the content of my mail). I followed a related post: How to read table pasted in outlook message body using vba? but the problem is that it delimits the table by its cells and not its line so instead of getting 17 rows a one header I get 18 x 5 = 90 elements of an array...I tried to change the delimiter in the split option from "vbCrLf" to "vbLf", "vbCr", Char(10)...but none of them worked...Anyone has an idea of why the split is not distinguishing a space delimiter from a new line? This seems to be the simplest methodology to read a table in a mail body but if you have any other suggestion I will def consider it as well!

Here is the sample link: sample

Thanks

Upvotes: 1

Views: 2075

Answers (1)

Tony Dallimore
Tony Dallimore

Reputation: 12413

My routine worked well on Sample.htm as this image shows. I will first explain this image and then give two health warnings. The code exceeds the Stack Overflow limit of 30,000 characters so I cannot include it. If this is interesting, look at my profile where you will find an email address. Email me and I will send you the code.

Output from macro

Explanation of image and ragged array

The routine behind this image is designed to extract data from tables within a web page and write it to a single ragged array. The worksheet in the image was created by copying the ragged array to it. I have used a background of ivory to show which cells were in the ragged array. (Note: a ragged array is one in which the rows have varying numbers of columns.)

Sample.htm contains a single table which is about as simple as they get. I would not have guessed that from the complexity of the Html.

My routine ignores anything outside <table> to </table>. Within <table> to </table> it recognises the elements of a table. Any whitespace within a cell (<td> to </td>) becomes a single space in line with the rules of Html. A <p> is replaced by two linefeeds and a <br> by one line feed. Any tags other than table elements are discarded so: Normal <b>bold</b> <i>italic</i> becomes Normal bold italic. Attributes are ignored. Having finished one <table> to </table>, the routine looks for another. The routine handles nested tables. Character entities (such as "&amp;") are converted to the equivalent unicode character (such as "&"). The routine does NOT handle errors in the table definition; everything must be properly nested with no end tags omitted.

Row 1 of the ragged array is the header for the first (and in this case only) table within Sample.htm. Its contents are:

1  20  5  0  0  2

The 1 says this is a level one table. A table nested within a cell of this table would be level two. A table nested within the level two table would be level three and so on.

There are 20 rows each with a maximum of 5 columns. The first zero means there is no header section. The second zero means there is no footer section. The two means the first (and in this case only) body section starts at row 2 of the ragged array. There would have been other row numbers after the 2 if there were multiple body sections.

The next 20 rows of the ragged array are the data rows for the table each containing up to 5 columns.

In row 2, only column A is ivory. That row of the Html table only contained one cell. That cell has a colspan attribute so the cell extends across all five columns of the Html table. The existence and value of the colspan attribute is not included in the ragged array although the existence of either a colspan or a rowspan attribute can be deduced from the lack of cells. WARNING: this routine conceals the complexity of the Html from the calling routine. It does not conceal the complexity of the table. Fortunately, your table is simple with only a single colspan attribute.

Row 2 of the Html table - row 3 of the ragged array and the worksheet - has five empty cells.

The remaining cells are almost exactly as they appear in the ragged array. Because of the complexity of the cell definitions within the Html, there are linefeeds which have become a space within the data. The cell data is within <p> to </p> which has become LineFeed LineFeed within the data. I have added bespoke code to the calling routine to discard the spaces and line feeds.

Health warning 1

The routine below is not quite what you ask for. The code was tested within an Excel workbook with Sample.htm being in the same folder as the workbook. You could either create a routine to save the Html body of the required message as an Html file or you could move this code to Outlook and adapt it to write to Excel from Outlook. There are questions, with coded answers, about both these options. I can recommend other answers for you to study but I think this answer is big enough.

Health warning 2

The code below includes:

  • A tiny bespoke macro I have written for you to demonstrate my routine.
  • A routine written by Dick Kusleika and published on Stack Overflow that I cannot better.
  • A number of routines written by me for me.

The health warning is for the routines written by me for me.

I do not use these routines very often so the comments are to remind ME how to use them. They are not designed to help someone else understand what they do.

I place Debug.Assert False ' Not tested at the top of every path through the code and then comment these statement out when I test that path. If you request the code you will see that I have not tested all paths. With one exception, these routines work with the web pages I wish to decode. The exception is a site where the authors are showing off by nesting tables to a depth of five. Unfortunately, they then get their <td>s and </td>s muddled up and my code does not handled invalid Html. I correct the web page source before running my routine because that is the easiest for me. As I become interested in more web pages, I will test more of the code but because the code is for me I do not look for test cases. If the code falls over for you, email me the html file and I will see what I can do.

I wrote these routines because they handle complex Html that Excel cannot. I suggest you try Excel on Sample.htm. The real Html is quite simple so if Excel can ignore the formatting it might be able to import this file.

Upvotes: 1

Related Questions