Brom
Brom

Reputation: 1

How to parse a string with html table tags?

I have a string:

string s= "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";

Which looks - formatted like this:

<tr>
    <td>abc</td>
    <td>1</td>
    <td>def</td>
</tr>
<tr>
    <td>aaa</td>
    <td>2</td>
    <td>bbb</td>
</tr>

Now I want get values "1" and "2", how do I do this? I have tried convert it to XML but not success.

Upvotes: 0

Views: 1665

Answers (5)

Johann Nel
Johann Nel

Reputation: 176

Good day Brom

This might not be the solution you were looking for but it will definitely provide one of the many help.

I would use this regex to extract all the tags

(<\/[a-z]*>)+(<[a-z]*>)+|(<[a-z]*>)+(<\/[a-z]*>)+|(<[a-z]*>)+|(<\/[a-z]*>)+

Example:

  string input = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";
  string replacement = "#";

  string pattern = "(<\/[a-z]*>)+(<[a-z]*>)+|(<[a-z]*>)+(<\/[a-z]*>)+|(<[a-z]*>)+|(<\/[a-z]*>)+";

  RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Compiled | 
  RegexOptions.Multiline;

  Regex rgx = new Regex(pattern, options);

  string result = rgx.Replace(input, replacement);
  // result == "#abc#1#def#aaa#2#bbb#"

This regex expression will grab the tags as groups or as individuals and then you could replace it with a delimiter line a pipe "|" or "#" and split on that. I hope this helps.

Kind Regards

Ps. Regex explanation: Pipes are used as or operators

(<\/[a-z]*>)+(<[a-z]*>)+ // Closing tag(s) that are followed by opening tag(s)
(<[a-z]*>)+(<\/[a-z]*>)+ // Opening tags followed by closing tags
(<[a-z]*>)+ // one or more opening tags
(<\/[a-z]*>)+ // one or more closing tags    

Upvotes: 0

user5014677
user5014677

Reputation: 694

            string s = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";

            var regexPunctuation = s;
            while (regexPunctuation != "")
            {
                regexPunctuation = System.Text.RegularExpressions.Regex.Match(s, @"\d+").Value;
                s = s.Substring(s.IndexOf(regexPunctuation)+regexPunctuation.Length);
                MessageBox.Show(regexPunctuation);
            }

The regex matches every number in the string and the while loop goes through all of them. Do what ever you want intead of MessageBox.Show and you're good to go.

Upvotes: 0

Daniel Tshuva
Daniel Tshuva

Reputation: 503

Regex regex = new Regex("<td>(.*?)<\\/td>");
var maches = regex.Matches("<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>");
var values = maches.Cast<Match>().Select(m => m.Groups[1].Value).ToList();

Upvotes: 0

Tien Nguyen Ngoc
Tien Nguyen Ngoc

Reputation: 1555

string s = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";
s = s.Replace("<tr>","").Replace("</tr>","").Replace("</td>","");
string[] val = s.Split(new string[] { "<td>" }, StringSplitOptions.None);

string one = val[2];
string two = val[5];

I hope it will work for you.

Upvotes: 1

Jaimin Dave
Jaimin Dave

Reputation: 1222

You can use HTML Agility Pack. to achieve this

HtmlDocument doc = new HtmlDocument();
doc.Parse(str);

IEnumerable<string> cells = doc.DocumentNode.Descendants("td").Select(td => td.InnerText);

Upvotes: 2

Related Questions