Reputation: 11
I'm currently trying to do some web scraping from this website: https://likee.video/hashtag/CuteHeadChallenge?lang=en
I managed to extract video link by using selenium, beautifulsoup and json here are the following code:
#click the element
element.click()
time.sleep(3)
page = wd.page_source
souptest = BeautifulSoup(page,'html.parser')
# code to get vid from clicked element here
data = json.loads(souptest.find('script', {"id" : 'videoObject', "type" : 'application/ld+json'}).text , strict=False)
folderpath = '/content/drive/MyDrive/LikeeData/' + data["author"]["name"]
video = data["contentUrl"]
Problem is when I tried to download the url, the downloaded file will have the size of 0 byte, here is what I tried:
urllib.request.urlretrieve(video, folderpath + '/vid.mp4')
req_file = requests.get(video)
with open(folderpath + '/vid.mp4', "wb") as file:
file.write(req_file .content)
Then I noticed that the the get request returned response 204 so I tried some solutions from another thread
r = requests.get(data["contentUrl"], stream=True)
while r.status_code == 204:
time.sleep(1)
print('still 204')
r = requests.get(data["contentUrl"])
but it did not work and always return 204, when I open the link in my browser it returns response 200, when I change the url to the thumbnail it worked, it did not work only when I tried to download video url, here is one such video url: https://video.like.video/asia_live/2s2/2Dz9d6_4.mp4?crc=1506960817&type=5
Please help me find out what is wrong here, thank you for you assistance
Upvotes: 1
Views: 1680
Reputation: 195573
You can try this example how to download all videos on the page (without a selenium
):
import re
import json
import requests
url = "https://likee.video/hashtag/CuteHeadChallenge?lang=en"
api_url = "https://api.like-video.com/likee-activity-flow-micro/videoApi/getEventVideo"
payload = {
"country": "US",
"page": 1,
"pageSize": 28,
"topicId": "",
}
html_doc = requests.get(url).text
data = re.search(r"window\.data = ({.*});", html_doc)[1]
data = json.loads(data)
payload["topicId"] = data["topicId"]
data = requests.post(api_url, json=payload).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print/save each videoUrl:
for i, video in enumerate(data["data"]["videoList"], 1):
print("Downloading {} as {}.mp4".format(video["videoUrl"], i))
# download video
with open("{}.mp4".format(i), "wb") as f_out:
f_out.write(requests.get(video["videoUrl"]).content)
Prints:
Downloading https://video.like.video/asia_live/2s2/2Dz9d6_4.mp4?crc=1506960817&type=5 as 1.mp4
Downloading https://video.like.video/asia_live/2s1/2gaWZB_4.mp4?crc=1571964795&type=5 as 2.mp4
Downloading https://video.like.video/asia_live/2s1/2RLMdC_4.mp4?crc=779823808&type=5 as 3.mp4
...
and saves the videos.
Upvotes: 2