Lloyd Walker
Lloyd Walker

Reputation: 11

How to retrieve the first child HTML element but exclude all other elements with querySelector

Currently using document.querySelector with puppeteer to retrieve the video links from a Tiktok account's HTML code and am having issues retrieving exactly what I need

With this code:

const grabURLs = await page.evaluate(() => {
    const pgTag = document.querySelector('.tiktok-1qb12g8-DivThreeColumnContainer.eegew6e2 div div div div div')
    return pgTag.innerHTML;
})

console.log(grabURLs)

I receive not only the href that I need but also all of the child elements below that, how do I limit it so the only innerHTML I receive is the first child?

<><div class="tiktok-x6y88p-DivItemContainerV2 e19c29qe7">
    <div data-e2e="user-post-item" class="tiktok-x6f6za-DivContainer-StyledDivContainerV2 e1gitlwo0">
        <div style="padding-top: 132.653%;" />
          <div class="tiktok-yz6ijl-DivWrapper e1cg0wnj1">
             <a href="https://www.tiktok.com/@nottooshabbycakes/video/7063163560238599430">
                <canvas width="75.38461538461539" height="100" class="tiktok-1yvkaiq-CanvasPlaceholder e19c29qe2"></canvas>
                <div class="tiktok-1wa52dp-DivPlayerContainer e19c29qe4">
                    <div mode="1" class="tiktok-1jxhpnd-DivContainer e1yey0rl0">
                        <img mode="1" src="https://p16-sign-va.tiktokcdn.com/obj/tos-maliva-p-0068/8ba10e1631d14b75b0bad5988c971113?x-expires=1663250400&amp;x-signature=%2F3yL04w%2FMNG9AF1TYWI51Sq41jU%3D" alt="🥺🥺 #corememory" loading="lazy" class="tiktok-1itcwxg-ImgPoster e1yey0rl1"></></div>
                    <div class="tiktok-11u47i-DivCardFooter e148ts220">
                        <svg class="like-icon tiktok-h342g4-StyledPlay e148ts225" width="18" height="18" viewBox="0 0 48 48" fill="#fff" xmlns="http://www.w3.org/2000/svg">
                            <path fill-rule="evenodd" clip-rule="evenodd" d="M16 10.554V37.4459L38.1463 24L16 10.554ZM12 8.77702C12 6.43812 14.5577 4.99881 16.5569 6.21266L41.6301 21.4356C43.5542 22.6038 43.5542 25.3962 41.6301 26.5644L16.5569 41.7873C14.5577 43.0012 12 41.5619 12 39.223V8.77702Z"></path></svg>
                        <strong data-e2e="video-views" class="video-count tiktok-1p23b18-StrongVideoCount e148ts222">57</strong>

Here is the HTML and I am trying to extract just the href but it's logging a with all elements below it

Any help would be greatly appreciated thank you!

Upvotes: 1

Views: 667

Answers (2)

Dan Mullin
Dan Mullin

Reputation: 4435

You just need to do a quick search of the page for all of the URLs that point to videos.

Here's how to do it on Tiktok:

var videos = document.querySelectorAll("a[href*='/video/']");

Edit:

To get all of the URLs into their own array afterwards, just create a new one and set each element to the href of the anchor tag:

var num = videos.length;
var links = [];
for (var i = 0; i < num; i++) {
    links.push(videos[i].href);
}

var videos = document.querySelectorAll("a[href*='/video/']");
var links = [];
var num = videos.length;
for (var i = 0; i < num; i++) {
  links.push(videos[i].href);
}
console.log(links);
section {
    display: flex;
    flex-flow: row wrap;
}

section>div {
    margin: 5px;
}

section>div>a {
    width: 192px;
    color: #fff;
    height: 108px;
    display: block;
    background: #a33;
    padding: 10px 13px;
    text-decoration: none;
    transition: all 200ms ease;
}

section>div>a:hover {
    background: #369;
    transition: all 200ms ease;
}
<section>
  <div><a href="/video/aaa" title="/video/aaa">Video aaa</a></div>
  <div><a href="/video/bbb" title="/video/bbb">Video bbb</a></div>
  <div><a href="/video/ccc" title="/video/ccc">Video ccc</a></div>
  <div><a href="/video/ddd" title="/video/ddd">Video ddd</a></div>
  <div><a href="/video/eee" title="/video/eee">Video eee</a></div>
  <div><a href="/video/fff" title="/video/fff">Video fff</a></div>
  <div><a href="/video/ggg" title="/video/ggg">Video ggg</a></div>
  <div><a href="/video/hhh" title="/video/hhh">Video hhh</a></div>
  <div><a href="/video/iii" title="/video/iii">Video iii</a></div>
  <div><a href="/video/jjj" title="/video/jjj">Video jjj</a></div>
</section>

Upvotes: 1

Obscure021
Obscure021

Reputation: 341

You can use the firstChild property.

So, the code becomes:

const grabURLs = await page.evaluate(() => {
    const pgTag = document.querySelector('.tiktok-1qb12g8-DivThreeColumnContainer.eegew6e2 div div div div div')
    return pgTag.firstChild.innerHTML;
})

console.log(grabURLs)

Upvotes: 1

Related Questions