Reputation: 421
I have been trying to web scrape a website using python. I want to scrape data from a tag and i can't figure out how. There are two lists in there and then there seems to be a dictionary
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.daraz.pk/smartphones/nokia/?spm=a2a0e.searchlistcategory.card.2.323e5fc3B8mWd8&from'
'=onesearch_category_3')
# print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
# print(soup)
if "priceCurrency":
phone = soup.find_all(type="application/ld+json")
print(phone[1])
This is my code and this gives me the tag. I just want to scrape specific elements from it.
Upvotes: 1
Views: 2376
Reputation: 1191
All the data of the phones on a certain page is located in a json file. I think this is faster than using beautifulsoup, but I am not sure. There is a lot more information than just the name and price, but that is what I put into a dataframe for you to see. View the url in the code to look at all the information.
import requests
import pandas as pd
list_phones = pd.DataFrame()
for page_num in range(1, 5): #Number of pages to search through
json_page = requests.get('https://www.daraz.pk/smartphones/nokia/?ajax=true&page=' + str(page_num)).json()
for phone in json_page['mods']['listItems']:
temp_df = pd.DataFrame([[phone['name'], phone['price']]], columns=['Name','Price'])
list_phones = list_phones.append(temp_df, sort=False).reset_index(drop=True)
print(list_phones)
Name Price
0 150 - Dual Sim - Camera - Card Slot - White 5050.00
1 Nokia 105 (2019 )- Dual sim - 1.77 3000.00
2 Nokia 210 - 2.4" - 16MB RAM- Dual SIM -Camera ... 5424.00
3 150 - Dual Sim - Camera - Card Slot - White 4999.00
4 New Nokia 106 2018 Dual Sim High Quality Keypa... 3140.00
5 105 Nokia 2019 2999.00
6 Nokia 3310 Mobile Phone - 2.4" QVGA Display - ... 9999.00
7 Nokia 105 2019 1.7 Inch Display 2000 Contact ... 2990.00
8 3310 - Dual Sim - 2.4 Inch LCD - Grey 8200.00
9 130 - 2017 - Dual Sim - Camera - Memory Card 3999.00
10 Nokia 6.1 Plus 4Gb 64Gb Black original (advanc... 26600
11 N 1 1Gb-8Gb - 4.5 Inches - Dark Blue 9500.00
12 Nokia mobile 105 100% original 1899.00
13 150 - Dual Sim - Camera - Card Slot - Black 4999.00
14 nokia 130 2017 3895.00
15 Nokia 150 5099.00
16 Nokia 106 - 2018 - 1.8" - Dual Sim - Dark Grey 3098.00
17 105 - 2017 - Dual Sim - Black 3250.00
18 Nokia 210 5450.00
19 Nokia 105 - Dual sim - 1.77” - 2019 3149.00
20 105 - Dual sim - 1.77” - 2019 - Black 3049.00
21 105 - 2017 - Dual Sim - Blue 3250.00
22 Nokia 106 (2018) - 1.8" inch Display - 4MB Sto... 3150.00
23 Nokai 6.1 plus BLUE 4GB 64GB 26600
24 Nokia 3.2 BLACK 3GB 64GB (ADVANCE TELECOM) 25400.00
25 Nokia 2.2 BLACK 3GB 32GB 17300
26 6.1 2018 - 5.5" - 3Gb Ram - 32G Rom - 16Mp Cam... 24999.00
27 3.1 Plus - 6 inches Display - 3Gb Ram - 32Gb Rom 19900.00
28 Nokia 106 2018 - 1.8 inch - Dual Sim 3150.00
29 Nokia 1 Mobile Phone-Dual Sim-1Gb-8Gb-Blue 9500.00
.. ... ...
130 Nokia 6.1 Plus 4Gb 64Gb Black original (advanc... 26600
131 N 1 1Gb-8Gb - 4.5 Inches - Dark Blue 9500.00
132 Nokia mobile 105 100% original 1899.00
133 150 - Dual Sim - Camera - Card Slot - Black 4999.00
134 nokia 130 2017 3895.00
135 Nokia 150 5099.00
136 Nokia 106 - 2018 - 1.8" - Dual Sim - Dark Grey 3098.00
137 105 - 2017 - Dual Sim - Black 3250.00
138 Nokia 210 5450.00
139 Nokia 105 - Dual sim - 1.77” - 2019 3149.00
140 105 - Dual sim - 1.77” - 2019 - Black 3049.00
141 105 - 2017 - Dual Sim - Blue 3250.00
142 Nokia 106 (2018) - 1.8" inch Display - 4MB Sto... 3150.00
143 Nokai 6.1 plus BLUE 4GB 64GB 26600
144 Nokia 3.2 BLACK 3GB 64GB (ADVANCE TELECOM) 25400.00
145 Nokia 2.2 BLACK 3GB 32GB 17300
146 6.1 2018 - 5.5" - 3Gb Ram - 32G Rom - 16Mp Cam... 24999.00
147 3.1 Plus - 6 inches Display - 3Gb Ram - 32Gb Rom 19900.00
148 Nokia 106 2018 - 1.8 inch - Dual Sim 3150.00
149 Nokia 1 Mobile Phone-Dual Sim-1Gb-8Gb-Blue 9500.00
150 150 -2.4"- Dual Sim - Camera - Card Slot - black 5050.00
151 Nokia 3.1 Plus - 6’’ HD+ display-Camera Front ... 18999.00
152 Nokia 210 Mobile Phone - 2.4" - 16MB RAM 5449.00
153 Nokia 7 Plus 33999.00
154 nokia 106 2018 /Nokia 106, 2000 contacts phone... 3000.00
155 N 8110 Dual Sim - 2.45" Lcd - 2.5Gb Rom - 2Mp ... 10000.00
156 nokia 210 2.4 inch 16 mb ram internet black ... 5425.00
157 Nokia 1 plus Mobile Phone 12499.00
158 3310 - Dual Sim 8000.00
159 Nokia 7.1 4GB/64GB 31699.00
[160 rows x 2 columns]
Upvotes: 1
Reputation: 84465
With bs4 4.7.1 you can use :contains to target the required script tag (otherwise use soup.find_all(type="application/ld+json")[1]
), or loop each script
tag and then check if "priceCurrency" in script.text:
. Your current set-up will always return True
.
When you extract the .text from the tag you have json you can parse with json library. The initial return is a dictionary. The key 'itemListElement' returns a collection of offers (dictionaries). You can loop that list and access items from each inner dictionary by key. 'offers'
returns a dictionary however so you would need to again access items from that by key.
import requests, json
from bs4 import BeautifulSoup
page = requests.get('https://www.daraz.pk/smartphones/nokia/?spm=a2a0e.searchlistcategory.card.2.323e5fc3B8mWd8&from=onesearch_category_3')
soup = BeautifulSoup(page.text, 'html.parser')
phones = soup.select_one('[type="application/ld+json"]:contains(priceCurrency)')
data = json.loads(phones.text)
for offer in data['itemListElement']:
print('item name : ' + offer['name'])
print('item price : ' + offer['offers']['priceCurrency'] + str(offer['offers']['price'])) #etc
Upvotes: 1