Reputation: 682
want to get infobox contents of https://en.wikipedia.org/wiki/Air_Alg%C3%A9rie
I followed this article.
import requests
from lxml import etree
url='https://en.wikipedia.org/wiki/Air_Alg%C3%A9rie'
req = requests.get(url)
store = etree.fromstring(req.text)
# this will give Motto portion of above
# URL's info box of Wikipedia's page
output = store.xpath('//table[@class="infobox vcard"]/tr[th/text()="Destinations"]/td/i')
# printing the text portion
print output[0].text
even though req.text exists, returns null. How can I get this infobox contents? especially,
IATA ICAO
AH DAH
I need IATA, ICAO code. please help.
Also remember that DBPedia is not synchronized in real-time with Wikipedia, you may experience a few months delay between wikipedia version and corresponding entry in DBPedia. I don't want DBPedia contents.
Upvotes: 0
Views: 561
Reputation: 142691
To get AH
, DAH
, AIR ALGERIE
you can use
xpath( '//td[@class="nickname"]' )
As for your xpath: in this HTML there is <tbody>
between <table>
and <tr>
so you would have to use it in xpath
'//table[@class="infobox vcard"]/tbody/tr[th/text()="Destinations"]/td'
or use //
and it will work even if there is more tags between <table>
and <tr>
'//table[@class="infobox vcard"]//tr[th/text()="Destinations"]/td'
I also skiped <i>
at the end because row "Destinations"
doesn't use <i>
import requests
from lxml import etree
url='https://en.wikipedia.org/wiki/Air_Alg%C3%A9rie'
req = requests.get(url)
store = etree.fromstring(req.text)
output = store.xpath('//td[@class="nickname"]')
for x in output:
print(x.text.strip())
#output = store.xpath('//table[@class="infobox vcard"]//tr[th/text()="Destinations"]/td')
output = store.xpath('//table[@class="infobox vcard"]/tbody/tr[th/text()="Destinations"]/td')
print(output[0].text)
Result
AH
DAH
AIR ALGERIE
69
EDIT:
I use another xpath to get names "IATA"
, "ICAO"
, "Callsign"
and then I use zip()
to groups them with "AH"
, "DAH"
, "AIR ALGERIE"
import requests
from lxml import etree
url='https://en.wikipedia.org/wiki/Air_Alg%C3%A9rie'
req = requests.get(url)
store = etree.fromstring(req.text)
keys = store.xpath('//table[@class="infobox vcard"]//table//tr[1]//a')
#for x in keys:
# print(x.text.strip())
values = store.xpath('//td[@class="nickname"]')
#for x in values:
# print(x.text.strip())
some_dict = dict()
for k, v in zip(keys, values):
k = k.text.strip()
v = v.text.strip()
some_dict[k] = v
print(k, '=', v)
print(some_dict)
Result:
IATA = AH
ICAO = DAH
Callsign = AIR ALGERIE
{'IATA': 'AH', 'ICAO': 'DAH', 'Callsign': 'AIR ALGERIE'}
Upvotes: 1