Reputation: 491
My objective is to pull urls from an xml document (linked) and put them in a list: https://www.valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml
I imported etree
from lxml
and created a list comprehension that pulls the text from all <instanceUrl>
tags.
url = 'https://valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml'
et = etree.fromstring(urlopen(url).read())
return [_.find('instanceUrl').text for _ in et.find('filings')]
Now, I want to restrict the list so that it only pulls the text from <instanceUrl>
tags where <formType>
=10K.
How can I achieve this?
Upvotes: 1
Views: 123
Reputation: 474171
You need an XPath expression and the xpath()
method :
[url.text for url in et.xpath('//filing[formType = "10-K"]/instanceUrl')]
Here we are filtering the filing
nodes that contain formType
child nodes with 10-K
text, then getting the instanceUrl
child.
Note that the _
variable name is used for throw-away variables - variables that have to be defined but not actually used (e.g. during unpacking). In your case, you've actually used it.
Upvotes: 2