Ibtsam Ahmad
Ibtsam Ahmad

Reputation: 421

Web scraping from a list and dictionary

I have been trying to web scrape a website using python. I want to scrape data from a tag and i can't figure out how. There are two lists in there and then there seems to be a dictionary

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.daraz.pk/smartphones/nokia/?spm=a2a0e.searchlistcategory.card.2.323e5fc3B8mWd8&from'
                    '=onesearch_category_3')
# print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
# print(soup)
if "priceCurrency":
    phone = soup.find_all(type="application/ld+json")

print(phone[1])

This is my code and this gives me the tag. I just want to scrape specific elements from it.

Upvotes: 1

Views: 2376

Answers (2)

LuckyZakary
LuckyZakary

Reputation: 1191

All the data of the phones on a certain page is located in a json file. I think this is faster than using beautifulsoup, but I am not sure. There is a lot more information than just the name and price, but that is what I put into a dataframe for you to see. View the url in the code to look at all the information.

Code

import requests
import pandas as pd

list_phones = pd.DataFrame()
for page_num in range(1, 5): #Number of pages to search through
    json_page = requests.get('https://www.daraz.pk/smartphones/nokia/?ajax=true&page=' + str(page_num)).json()

    for phone in json_page['mods']['listItems']:
        temp_df = pd.DataFrame([[phone['name'], phone['price']]], columns=['Name','Price'])
        list_phones = list_phones.append(temp_df, sort=False).reset_index(drop=True)

print(list_phones)

Output

                                                  Name     Price
0          150 - Dual Sim - Camera - Card Slot - White   5050.00
1                   Nokia 105 (2019 )- Dual sim - 1.77   3000.00
2    Nokia 210 - 2.4" - 16MB RAM- Dual SIM -Camera ...   5424.00
3          150 - Dual Sim - Camera - Card Slot - White   4999.00
4    New Nokia 106 2018 Dual Sim High Quality Keypa...   3140.00
5                                       105 Nokia 2019   2999.00
6    Nokia 3310 Mobile Phone - 2.4" QVGA Display - ...   9999.00
7    Nokia  105 2019 1.7 Inch Display 2000 Contact ...   2990.00
8                3310 - Dual Sim - 2.4 Inch LCD - Grey   8200.00
9         130 - 2017 - Dual Sim - Camera - Memory Card   3999.00
10   Nokia 6.1 Plus 4Gb 64Gb Black original (advanc...     26600
11                N 1 1Gb-8Gb - 4.5 Inches - Dark Blue   9500.00
12                      Nokia mobile 105 100% original   1899.00
13         150 - Dual Sim - Camera - Card Slot - Black   4999.00
14                                      nokia 130 2017   3895.00
15                                           Nokia 150   5099.00
16      Nokia 106 - 2018 - 1.8" - Dual Sim - Dark Grey   3098.00
17                       105 - 2017 - Dual Sim - Black   3250.00
18                                           Nokia 210   5450.00
19                 Nokia 105 - Dual sim - 1.77” - 2019   3149.00
20               105 - Dual sim - 1.77” - 2019 - Black   3049.00
21                        105 - 2017 - Dual Sim - Blue   3250.00
22   Nokia 106 (2018) - 1.8" inch Display - 4MB Sto...   3150.00
23                        Nokai 6.1 plus BLUE 4GB 64GB     26600
24          Nokia 3.2 BLACK 3GB 64GB (ADVANCE TELECOM)  25400.00
25                            Nokia 2.2 BLACK 3GB 32GB     17300
26   6.1 2018 - 5.5" - 3Gb Ram - 32G Rom - 16Mp Cam...  24999.00
27    3.1 Plus - 6 inches Display - 3Gb Ram - 32Gb Rom  19900.00
28                Nokia 106 2018 - 1.8 inch - Dual Sim   3150.00
29          Nokia 1 Mobile Phone-Dual Sim-1Gb-8Gb-Blue   9500.00
..                                                 ...       ...
130  Nokia 6.1 Plus 4Gb 64Gb Black original (advanc...     26600
131               N 1 1Gb-8Gb - 4.5 Inches - Dark Blue   9500.00
132                     Nokia mobile 105 100% original   1899.00
133        150 - Dual Sim - Camera - Card Slot - Black   4999.00
134                                     nokia 130 2017   3895.00
135                                          Nokia 150   5099.00
136     Nokia 106 - 2018 - 1.8" - Dual Sim - Dark Grey   3098.00
137                      105 - 2017 - Dual Sim - Black   3250.00
138                                          Nokia 210   5450.00
139                Nokia 105 - Dual sim - 1.77” - 2019   3149.00
140              105 - Dual sim - 1.77” - 2019 - Black   3049.00
141                       105 - 2017 - Dual Sim - Blue   3250.00
142  Nokia 106 (2018) - 1.8" inch Display - 4MB Sto...   3150.00
143                       Nokai 6.1 plus BLUE 4GB 64GB     26600
144         Nokia 3.2 BLACK 3GB 64GB (ADVANCE TELECOM)  25400.00
145                           Nokia 2.2 BLACK 3GB 32GB     17300
146  6.1 2018 - 5.5" - 3Gb Ram - 32G Rom - 16Mp Cam...  24999.00
147   3.1 Plus - 6 inches Display - 3Gb Ram - 32Gb Rom  19900.00
148               Nokia 106 2018 - 1.8 inch - Dual Sim   3150.00
149         Nokia 1 Mobile Phone-Dual Sim-1Gb-8Gb-Blue   9500.00
150   150 -2.4"- Dual Sim - Camera - Card Slot - black   5050.00
151  Nokia 3.1 Plus - 6’’ HD+ display-Camera Front ...  18999.00
152           Nokia 210 Mobile Phone - 2.4" - 16MB RAM   5449.00
153                                       Nokia 7 Plus  33999.00
154  nokia 106 2018 /Nokia 106, 2000 contacts phone...   3000.00
155  N 8110 Dual Sim - 2.45" Lcd - 2.5Gb Rom - 2Mp ...  10000.00
156  nokia 210 2.4 inch 16 mb ram internet black   ...   5425.00
157                          Nokia 1 plus Mobile Phone  12499.00
158                                    3310 - Dual Sim   8000.00
159                                 Nokia 7.1 4GB/64GB  31699.00

[160 rows x 2 columns]

Upvotes: 1

QHarr
QHarr

Reputation: 84465

With bs4 4.7.1 you can use :contains to target the required script tag (otherwise use soup.find_all(type="application/ld+json")[1] ), or loop each script tag and then check if "priceCurrency" in script.text:. Your current set-up will always return True.

When you extract the .text from the tag you have json you can parse with json library. The initial return is a dictionary. The key 'itemListElement' returns a collection of offers (dictionaries). You can loop that list and access items from each inner dictionary by key. 'offers' returns a dictionary however so you would need to again access items from that by key.

import requests, json
from bs4 import BeautifulSoup

page = requests.get('https://www.daraz.pk/smartphones/nokia/?spm=a2a0e.searchlistcategory.card.2.323e5fc3B8mWd8&from=onesearch_category_3')
soup = BeautifulSoup(page.text, 'html.parser')
phones = soup.select_one('[type="application/ld+json"]:contains(priceCurrency)')
data = json.loads(phones.text)

for offer in data['itemListElement']:
    print('item name : ' + offer['name'])
    print('item price : ' + offer['offers']['priceCurrency'] + str(offer['offers']['price'])) #etc

Upvotes: 1

Related Questions