Reputation: 1179
I want to scrape the amount of money pledged for a project from the following websites, I use the same method but for one website the code returns no value.
The output of this code is an empty array:
import urllib
import requests
from lxml import html
url = 'https://www.kickstarter.com/projects/scratchideas/loki-the-ultra-portable-modular-and-robust-camera?ref=category'
page = requests.get(url=url)
tree = html.fromstring(page.content)
pledged = tree.xpath('//*[@id="react-project-header"]/div/div/div[3]/div/div[2]/div[1]/div[2]/span[1]/span/text()')
print("pledged: {}".format(pledged))
But the following code returns the true value of money pledged to the project:
url = 'https://www.kickstarter.com/projects/254683764/avoseedo-grow-your-own-avocodo-tree-with-ease'
page = requests.get(url=url)
tree = html.fromstring(page.content)
pledged = tree.xpath('//*[@id="content-wrap"]/div[2]/section[1]/div/div/div/div[1]/div/div[2]/div[2]/div[1]/h3/span/text()')
print("pledged: {}".format(pledged))
So, I am wondering what's the difference and why is it like that?
Upvotes: 0
Views: 49
Reputation: 5915
Project AvoSeedo has been completed. It seems when you download a funded project webpage, the amount of money pledged is written inside the body of the document.
For unfunded project or when the collect is ongoing, the amount of money can't be found in the body. It relies on Javascript for dynamic rendering.
To get the amount of money pledged in the LOKI project, Selenium would be required. Or we can use a workaround. This XPath should fetch the amount of money pledged for the LOKI project :
normalize-space(substring-before(substring-after(//script[contains(.,"pledged_amount")],"pledged_amount":"),",""))
Output : 15177
EDIT : 15177 € or 20073 $ (depending of your locales)
Upvotes: 1