Reputation: 29
I used WebClient
in C# to get an html doc of a Youtube video. Now I'm trying to get a Youtube comment out of the doc, but it's not working because different comments that use the same element (yt-formatted-string) have different attributes(class, id,span, and so on). So I'm trying to get regex to complete them for me and just get to the end tag (>).
Tried to use "." in regex, kind of like using the re module in python: re.compile(r('.')) in python, where it takes spaces,symbol, and characters and just completes them for me. Not sure if that even exists in C#, but I hope so.
WebClient web = new WebClient();
String content = web.DownloadString(@"https://www.youtube.com/watch?v=hE73JvEc2pQ");
MatchCollection matches = Regex.Matches(content, @"<yt-formatted-string\.>\s*(.+?)\s*</yt-formatted-string>", RegexOptions.Multiline);
foreach (Match match in matches)
{
textComment.Text = $"\n{match.Groups[1].Value}";
}
Got nothing.
Want the Regex to complete attributes for me, like so:
Html line:
yt-formatted-string id="content-text" slot="content" split-lines="" class="style-scope ytd-comment-renderer">
Imaginary c sharp code that allows me to complete attributes:
"yt-formatted-string(complete all the attributes here)>\s*(.+?)\s*</yt-formatted-string>"
Upvotes: 0
Views: 72
Reputation: 404
For cases where an API is not available, you should also avoid trying to parse html with a regex, and instead parse it as XML. See https://stackoverflow.com/a/1732454/6055952 for more information.
Upvotes: 0
Reputation: 30565
you don't need to deal with such a complicated parsing. Just use Youtube Data API
Check This API
Upvotes: 1