user5204184
user5204184

Reputation: 341

Get Text Between Two Strings (HTML) in C#

I am trying to parse a website's HTML and then get text between two strings.

I wrote a small function to get text between two strings.

public string getBetween(string strSource, string strStart, string strEnd)
{
    int Start, End;
    if (strSource.Contains(strStart) && strSource.Contains(strEnd))
    {
        Start = strSource.IndexOf(strStart, 0) + strStart.Length;
        End = strSource.IndexOf(strEnd, Start);
        return strSource.Substring(Start, End - Start);
    }
    else
    {
        return string.Empty;
    }
}

I have the HTML stored in a string called 'html'. Here is a part of the HTML that I am trying to parse:

<div class="info">
                                    <div class="content">
                                        <div class="address">
                                        <h3>Andrew V. Kenny</h3>
                                        <div class="adr">
                                        67 Romines Mill Road<br/>Dallas, TX 75204                                        </div>
                                    </div>

<p>Curious what <strong>Andrew</strong> means? <a href="http://www.babysfirstdomain.com/meaning/boy/andrew">Click here to find out!</a></p>

So, I use my function like this.

    string m2 = getBetween(html, "<div class=\"address\">", "<p>Curious what");
    string fullName = getBetween(m2, "<h3>", "</h3>");
    string fullAddress = getBetween(m2, "<div class=\"adr\">", "<br/>");
    string city = getBetween(m2, "<br/>", "</div>");

The output of the full name works fine, but the others have additional spaces in them for some reason. I tried various ways to avoid them (such as completely copying the spaces from the source and adding them in my function) but it didn't work.

I get an output like this:

fullName = "Andrew V. Kenny"
fullAddress = "                                            67 Romines Mill Road"
city = "Dallas, TX 75204                                        "

There are spaces in the city and address which I don't know how to avoid.

Upvotes: 0

Views: 305

Answers (1)

Alexandre Borela
Alexandre Borela

Reputation: 1616

Trim the string and the unecessary spaces will be gone:

fullName = fullName.Trim ();
fullAddress = fullAddress.Trim ();
city = city.Trim ();

Upvotes: 3

Related Questions