PerplexedSlime
PerplexedSlime

Reputation: 11

Downloading video from URL using python results in file with 0 byte

I'm currently trying to do some web scraping from this website: https://likee.video/hashtag/CuteHeadChallenge?lang=en

I managed to extract video link by using selenium, beautifulsoup and json here are the following code:

  #click the element
  element.click()
  time.sleep(3)
  page = wd.page_source
  souptest = BeautifulSoup(page,'html.parser')

  # code to get vid from clicked element here
  data = json.loads(souptest.find('script', {"id" : 'videoObject', "type" : 'application/ld+json'}).text , strict=False)

  folderpath = '/content/drive/MyDrive/LikeeData/' + data["author"]["name"]
  video = data["contentUrl"]

Problem is when I tried to download the url, the downloaded file will have the size of 0 byte, here is what I tried:

urllib.request.urlretrieve(video, folderpath + '/vid.mp4')
req_file = requests.get(video)

with open(folderpath + '/vid.mp4', "wb") as file:
  file.write(req_file .content)

Then I noticed that the the get request returned response 204 so I tried some solutions from another thread

r = requests.get(data["contentUrl"], stream=True)
  while r.status_code == 204:
    time.sleep(1)
    print('still 204')
    r = requests.get(data["contentUrl"])

but it did not work and always return 204, when I open the link in my browser it returns response 200, when I change the url to the thumbnail it worked, it did not work only when I tried to download video url, here is one such video url: https://video.like.video/asia_live/2s2/2Dz9d6_4.mp4?crc=1506960817&type=5

Please help me find out what is wrong here, thank you for you assistance

Upvotes: 1

Views: 1680

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195573

You can try this example how to download all videos on the page (without a selenium):

import re
import json
import requests

url = "https://likee.video/hashtag/CuteHeadChallenge?lang=en"
api_url = "https://api.like-video.com/likee-activity-flow-micro/videoApi/getEventVideo"

payload = {
    "country": "US",
    "page": 1,
    "pageSize": 28,
    "topicId": "",
}

html_doc = requests.get(url).text
data = re.search(r"window\.data = ({.*});", html_doc)[1]
data = json.loads(data)

payload["topicId"] = data["topicId"]


data = requests.post(api_url, json=payload).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

# print/save each videoUrl:
for i, video in enumerate(data["data"]["videoList"], 1):
    print("Downloading {} as {}.mp4".format(video["videoUrl"], i))
    # download video
    with open("{}.mp4".format(i), "wb") as f_out:
        f_out.write(requests.get(video["videoUrl"]).content)

Prints:

Downloading https://video.like.video/asia_live/2s2/2Dz9d6_4.mp4?crc=1506960817&type=5 as 1.mp4
Downloading https://video.like.video/asia_live/2s1/2gaWZB_4.mp4?crc=1571964795&type=5 as 2.mp4
Downloading https://video.like.video/asia_live/2s1/2RLMdC_4.mp4?crc=779823808&type=5 as 3.mp4

...

and saves the videos.

Upvotes: 2

Related Questions