podeig
podeig

Reputation: 2741

Get tables with defined class

I have html like this:

<table class="tbNoBorder" ..some attr here...>
<tr><td>1</td></tr><tr><td>2</td></tr>
</table>

<table ..some attr here... >
<tr><td>1</td></tr><tr><td>2</td></tr>
</table>

I need to convert using Regex to

<table class="tbNoBorder" cellspacing="0" cellpadding="0" ..some attr here...>
<tr><td style="padding: 5px;">1</td></tr><tr><td style="padding: 5px;">2</td></tr>
</table>

<table cellspacing="0" cellpadding="0" ..some attr here... >
<tr><td style="border: solid 1px #ccc; padding: 5px;">1</td></tr><tr><td style="border: solid 1px #ccc; margin: 0; padding: 5px;">2</td></tr>
</table>

Then I convert it to Word, that is why I need this convertion. Tables which has class tbNoBorder must not have any borders.

I wrote this code to do this, but all tables comes with borders. The first Regex takes all tables. Any ideas to get it work?

        //Fixes tables with borders
        content = Regex.Replace(content,
            @"<table(.*?)(?!tbNoBorder)(.*?)>(.*?)</table>",
            m =>
                {
                    var tableContent = Regex.Replace(m.Groups[3].ToString(), 
                                        @"<td",
                                        t => "<td style=\"border: solid 1px #ccc; padding: 5px;\"", RegexOptions.IgnoreCase
                                        );
                    return "<table cellspacing=\"0\" cellpadding=\"0\"" + m.Groups[1] + m.Groups[2] + ">" + tableContent + "</table>";
                }, RegexOptions.IgnoreCase
            );

        //Fixes tables without borders, has class tbNoBorder
        content = Regex.Replace(content,
            @"<table(.*?)tbNoBorder(.*?)>(.*?)</table>",
            m =>
            {
                var tableContent = Regex.Replace(m.Groups[3].ToString(),
                                    @"<td",
                                    t => "<td style=\"padding: 5px;\"", RegexOptions.IgnoreCase
                                    );
                return "<table cellspacing=\"0\" cellpadding=\"0\" + m.Groups[1] + m.Groups[2] + ">" + tableContent + "</table>";
            }, RegexOptions.IgnoreCase
        );

Upvotes: 0

Views: 164

Answers (2)

Filburt
Filburt

Reputation: 18082

Using xslt it could be solved like this:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" />

    <xsl:template match="/">
        <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="table">
        <xsl:element name="table">
            <xsl:attribute name="cellspacing">0</xsl:attribute>
            <xsl:attribute name="cellpadding">0</xsl:attribute>
            <xsl:apply-templates select="@* | node()" />
        </xsl:element>
    </xsl:template>

    <xsl:template match="td">
        <xsl:element name="td">
            <xsl:if test="ancestor::table[not(@class='tbNoBorder')][1]">
                <xsl:attribute name="style">border: solid 1px #ccc; padding: 5px;</xsl:attribute>
            </xsl:if>
            <xsl:apply-templates />
        </xsl:element>
    </xsl:template>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Upvotes: 1

stema
stema

Reputation: 93026

Change your first regex to

@"<table(?![^>]*tbNoBorder)(.*?)>(.*?)</table>"

then it will fail if there is a tbNoBorder in the opening tag

Upvotes: 3

Related Questions