The but monkey
The but monkey

Reputation: 29

How to use regex to complete an element in HTML in C# properly?

I used WebClient in C# to get an html doc of a Youtube video. Now I'm trying to get a Youtube comment out of the doc, but it's not working because different comments that use the same element (yt-formatted-string) have different attributes(class, id,span, and so on). So I'm trying to get regex to complete them for me and just get to the end tag (>).

Tried to use "." in regex, kind of like using the re module in python: re.compile(r('.')) in python, where it takes spaces,symbol, and characters and just completes them for me. Not sure if that even exists in C#, but I hope so.

        WebClient web = new WebClient();
        String content = web.DownloadString(@"https://www.youtube.com/watch?v=hE73JvEc2pQ");

        MatchCollection matches = Regex.Matches(content, @"<yt-formatted-string\.>\s*(.+?)\s*</yt-formatted-string>", RegexOptions.Multiline);
        foreach (Match match in matches)
        {
            textComment.Text = $"\n{match.Groups[1].Value}";
        }

Got nothing.

Want the Regex to complete attributes for me, like so:

Html line:

yt-formatted-string id="content-text" slot="content" split-lines="" class="style-scope ytd-comment-renderer">

Imaginary c sharp code that allows me to complete attributes:

"yt-formatted-string(complete all the attributes here)>\s*(.+?)\s*</yt-formatted-string>"

Upvotes: 0

Views: 72

Answers (2)

Matthew Varga
Matthew Varga

Reputation: 404

For cases where an API is not available, you should also avoid trying to parse html with a regex, and instead parse it as XML. See https://stackoverflow.com/a/1732454/6055952 for more information.

Upvotes: 0

Derviş Kayımbaşıoğlu
Derviş Kayımbaşıoğlu

Reputation: 30565

you don't need to deal with such a complicated parsing. Just use Youtube Data API

Check This API

Upvotes: 1

Related Questions