Reputation: 15835
I have this string:
This is sample <p id="short"> the value of short </p> <p id="medium"> the value of medium </p> <p id="large"> the value of large</p>
which I want to break into 3 pieces:
this is sample
the value of short
the value of medium
the value of large
Upvotes: 1
Views: 245
Reputation: 17457
Using the HtmlAgilityPack its is very simples:
string html = "This is sample <p id=\"short\"> the value of short </p> <p id=\"medium\"> the value of medium </p> <p id=\"large\"> the value of large</p>";
string id = null;
NameValueCollection output = new NameValueCollection();
string[] pIds = new string[3] { "short", "medium", "large" };
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
int c = 0;
int len = pIds.Length;
while (c < len)
{
id = pIds[c];
output.Add(id, doc.GetElementbyId(id).InnerHtml);
c++;
}
//In key of output variable, is equivalent to value of paragraph. example:
Console.WriteLine(output["short"].ToString());
Output:the value of short
Upvotes: 0
Reputation: 10564
Building on Bala R's answer, here's a more succinct way to do it with XPath:
string input = @"This is sample <p id=""short""> the value of short </p> <p id=""medium""> the value of medium </p> <p id=""large""> the value of large</p>";
var xmlWrapper = "<html>" + input + "</html>";
var elements = XElement.Parse(xmlWrapper).XPathSelectElements("/*").ToList();
var text = elements[0].PreviousNode.ToString();
var small = elements[0].Value;
var medium = elements[1].Value;
var large = elements[2].Value;
Upvotes: 1
Reputation: 2940
First of all, it was said many times here that you should not use regex for parsing html, for several reasons (mainly that html is not a regular language) and you should use an HTML parser.
However, if for whatever constraints you cant use an HTML parser you can do the folowing:
1. string before p tags - \w[^<]
2. short - <p id="short"> [\w|\s]* [^<]
3. medium - <p id="medium"> [\w|\s]* [^<]
4. large - <p id="large"> [\w|\s]* [^<]
Cheers.
Upvotes: 0
Reputation: 14906
(?<string_before_p_tags>[^<]*)<p id="short">(?<short>.*)</p>\s*<p id="medium">(?<medium>.*)</p>\s*<p id="large">(?<large>.*)</p>
Returns the named capture groups:
string_before_p_tags
: This is sample
short
: the value of short
medium
: the value of medium
large
: the value of large
Upvotes: 2
Reputation: 10564
Here's my stab at it:
var regex = new Regex("(?<text>.*?)<p.*?>(?<small>.*?)</p>.*<p.*?>(?<medium>.*?)</p>.*.*<p.*?>(?<large>.*?)</p>.*");
var htmlsnip = @"This is sample <p id=""short""> the value of short </p> <p id=""medium""> the value of medium </p> <p id=""large""> the value of large</p>";
var match = regex.Match(htmlsnip);
var text = match.Groups["text"].Value;
var small = match.Groups["small"].Value;
var medium = match.Groups["medium"].Value;
var large = match.Groups["large"].Value;
Upvotes: 3
Reputation: 109037
If you don't mind a non-regex solution (because HTML is not a regular language) you can use this
string input = @"This is sample <p id=""short""> the value of short </p> <p id=""medium""> the value of medium </p> <p id=""large""> the value of large</p>";
string before = input.Substring(0, input.IndexOf("<"));
string xmlWrapper = "<html>" + input.Substring(input.IndexOf("<")) + "</html>";
XElement xElement = XElement.Parse(xmlWrapper);
var shortElement =
xElement.Elements().Where(p => p.Name == "p" && p.Attribute("id").Value == "short").SingleOrDefault();
var shortValue = shortElement != null ? shortElement.Value : string.Empty;
var mediumElement =
xElement.Elements().Where(p => p.Name == "p" && p.Attribute("id").Value == "medium").SingleOrDefault();
var mediumValue = shortElement != null ? shortElement.Value : string.Empty;
var largelement =
xElement.Elements().Where(p => p.Name == "p" && p.Attribute("id").Value == "large").SingleOrDefault();
var largeValue = shortElement != null ? shortElement.Value : string.Empty;
Upvotes: 4