Reputation: 11
Currently using document.querySelector with puppeteer to retrieve the video links from a Tiktok account's HTML code and am having issues retrieving exactly what I need
With this code:
const grabURLs = await page.evaluate(() => {
const pgTag = document.querySelector('.tiktok-1qb12g8-DivThreeColumnContainer.eegew6e2 div div div div div')
return pgTag.innerHTML;
})
console.log(grabURLs)
I receive not only the href that I need but also all of the child elements below that, how do I limit it so the only innerHTML I receive is the first child?
<><div class="tiktok-x6y88p-DivItemContainerV2 e19c29qe7">
<div data-e2e="user-post-item" class="tiktok-x6f6za-DivContainer-StyledDivContainerV2 e1gitlwo0">
<div style="padding-top: 132.653%;" />
<div class="tiktok-yz6ijl-DivWrapper e1cg0wnj1">
<a href="https://www.tiktok.com/@nottooshabbycakes/video/7063163560238599430">
<canvas width="75.38461538461539" height="100" class="tiktok-1yvkaiq-CanvasPlaceholder e19c29qe2"></canvas>
<div class="tiktok-1wa52dp-DivPlayerContainer e19c29qe4">
<div mode="1" class="tiktok-1jxhpnd-DivContainer e1yey0rl0">
<img mode="1" src="https://p16-sign-va.tiktokcdn.com/obj/tos-maliva-p-0068/8ba10e1631d14b75b0bad5988c971113?x-expires=1663250400&x-signature=%2F3yL04w%2FMNG9AF1TYWI51Sq41jU%3D" alt="🥺🥺 #corememory" loading="lazy" class="tiktok-1itcwxg-ImgPoster e1yey0rl1"></></div>
<div class="tiktok-11u47i-DivCardFooter e148ts220">
<svg class="like-icon tiktok-h342g4-StyledPlay e148ts225" width="18" height="18" viewBox="0 0 48 48" fill="#fff" xmlns="http://www.w3.org/2000/svg">
<path fill-rule="evenodd" clip-rule="evenodd" d="M16 10.554V37.4459L38.1463 24L16 10.554ZM12 8.77702C12 6.43812 14.5577 4.99881 16.5569 6.21266L41.6301 21.4356C43.5542 22.6038 43.5542 25.3962 41.6301 26.5644L16.5569 41.7873C14.5577 43.0012 12 41.5619 12 39.223V8.77702Z"></path></svg>
<strong data-e2e="video-views" class="video-count tiktok-1p23b18-StrongVideoCount e148ts222">57</strong>
Here is the HTML and I am trying to extract just the href but it's logging a with all elements below it
Any help would be greatly appreciated thank you!
Upvotes: 1
Views: 667
Reputation: 4435
You just need to do a quick search of the page for all of the URLs that point to videos.
Here's how to do it on Tiktok:
var videos = document.querySelectorAll("a[href*='/video/']");
Edit:
To get all of the URLs into their own array afterwards, just create a new one and set each element to the href
of the anchor tag:
var num = videos.length;
var links = [];
for (var i = 0; i < num; i++) {
links.push(videos[i].href);
}
var videos = document.querySelectorAll("a[href*='/video/']");
var links = [];
var num = videos.length;
for (var i = 0; i < num; i++) {
links.push(videos[i].href);
}
console.log(links);
section {
display: flex;
flex-flow: row wrap;
}
section>div {
margin: 5px;
}
section>div>a {
width: 192px;
color: #fff;
height: 108px;
display: block;
background: #a33;
padding: 10px 13px;
text-decoration: none;
transition: all 200ms ease;
}
section>div>a:hover {
background: #369;
transition: all 200ms ease;
}
<section>
<div><a href="/video/aaa" title="/video/aaa">Video aaa</a></div>
<div><a href="/video/bbb" title="/video/bbb">Video bbb</a></div>
<div><a href="/video/ccc" title="/video/ccc">Video ccc</a></div>
<div><a href="/video/ddd" title="/video/ddd">Video ddd</a></div>
<div><a href="/video/eee" title="/video/eee">Video eee</a></div>
<div><a href="/video/fff" title="/video/fff">Video fff</a></div>
<div><a href="/video/ggg" title="/video/ggg">Video ggg</a></div>
<div><a href="/video/hhh" title="/video/hhh">Video hhh</a></div>
<div><a href="/video/iii" title="/video/iii">Video iii</a></div>
<div><a href="/video/jjj" title="/video/jjj">Video jjj</a></div>
</section>
Upvotes: 1
Reputation: 341
You can use the firstChild
property.
So, the code becomes:
const grabURLs = await page.evaluate(() => {
const pgTag = document.querySelector('.tiktok-1qb12g8-DivThreeColumnContainer.eegew6e2 div div div div div')
return pgTag.firstChild.innerHTML;
})
console.log(grabURLs)
Upvotes: 1