Reputation: 69
i am trying to extract specific data from requested json file
so after passing Authorization and using requests.get i got my request , i think it is called dictionary for python coders and called json for javascript coders it containt too much information that i dont need and i would like to extract one or two only for example {"bio" : " hello world " } and that json file contains more that one " bio " for example i am scraping 100 accounts and i would like to extract all " bio " in one code
so i tried this :
from bs4 import BeautifulSoup
import requests
headers = {"Authorization" : "xxxx"}
req = requests.get('website', headers = headers)
data = req.text
soup = BeautifulSoup(data,'html.parser')
titles = soup.find_all('span',{'class':'bio'})
for title in titles :
print(title.text)
and didnt work , i tried multiple ideas with no success if possible please write me a code that i can understande since iam trying to learn more about my mistakes
thanks
Upvotes: 0
Views: 594
Reputation: 1706
The Aphid
library I created is perfect for this.
from command-prompt
py -m pip install Aphid
Then its just as easy as loading your json data and searching it with aphid.
import json
import Aphid
resp = requests.get(yoururl)
data = json.loads(resp.text)
results = Aphid.findall(data, 'bio')
results
is now equal to a list of tuples(key, value), of every occurence of the 'bio' key.
Upvotes: 1
Reputation: 631
After you get your request either:
you get a simple json file (in which case you import it to python using json) or
you get an html file from which you can extract the json code (using BeautifulSoup) which in turn you will parse using json library.
Upvotes: 0