Railey Shahril
Railey Shahril

Reputation: 109

Iterate data to Pandas dataframe (python)

Im new to python and im trying to collect data from a website. My issue now is im stuck in the last flow where I want to take the data and iterate it to Pandas dataframe before saving it in a database/csv file.

I tried to append the data using loop but it seems that my loop is not working. if you can see if i view "cols" I managed to clean the data up but it not come into the table.

import requests, pandas, numpy, matplotlib.pyplot
from bs4 import BeautifulSoup
#### page info ###
page = requests.get("https://postcode.my/search/?keyword=&state=Kedah")
#### check page status (will come 200 if the page is ok) 
page.status_code
### call Library
soup = BeautifulSoup(page.content, 'html.parser')
### Find rows 
rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")
## define column
LOCATION = []
AREA = []
STATE = []
POSTCODE = []
TABLE = []
counter= 0 
for row in rows:
    cols = row.find_all("td")
    cols = [x.text.strip() for x in cols]
if cols!='':
    TABLE.append(cols)
    counter=counter+1
if counter == 4:
    LOCATION.append(TABLES[0])
    AREA.append(TABLE[1])
    STATE.append(TABLE[2])
    POSTCODE.append(TABLE[3])
    counter = (0)
    TABLE = []
PDTABLE = pandas.DataFrame({
    "LOCATION" : LOCATION,
    "AREA" : AREA,
    "STATE" : STATE,
    "POSTCODE" : POSTCODE
    })

PDTABLE

Thank You Best Regards Railey Shahril

Upvotes: 0

Views: 77

Answers (1)

jezrael
jezrael

Reputation: 862601

Use:

import requests, pandas, numpy, matplotlib.pyplot, numpy
from bs4 import BeautifulSoup
#### page info ###
page = requests.get("https://postcode.my/search/?keyword=&state=Kedah")
#### check page status (will come 200 if the page is ok) 
page.status_code
### call Library
soup = BeautifulSoup(page.content, 'html.parser')
### Find rows 
rows = soup.find_all(class_="col-lg-12 col-md-12 col-sm-12 col-xs-12")

Create list by append:

L = []
for row in rows:
    cols = row.find_all("td")
    cols = [x.text.strip() for x in cols]
    L.append(cols)

Convert to numpy array and reshape to 4 columns:

cols = ['LOCATION','AREA','STATE','POSTCODE']
PDTABLE = pandas.DataFrame(numpy.array(L).reshape(-1, 4), columns=cols)
print (PDTABLE)
                                 LOCATION             AREA  STATE POSTCODE
0                         Akauntan Negeri       Alor Setar  Kedah    05594
1                            Alor Gelegah       Alor Setar  Kedah    05400
2                     Alor Ibus Tepi Laut      Kuala Kedah  Kedah    06600
3                            Alor Janggus       Alor Setar  Kedah    06250
4                              Alor Malai       Alor Setar  Kedah    05460
5                     Alor Melintang Anak       Alor Setar  Kedah    05150
6                   Alor Melintang Gunung       Alor Setar  Kedah    05150
7                              Alor Merah       Alor Setar  Kedah    05250
8                             Alor Nibong  Kota Kuala Muda  Kedah    08500
9                              Alor Selut       Alor Setar  Kedah    05400
10              Alor Setar - Beg berkunci       Alor Setar  Kedah    05990
11         Alor Setar - Peti surat 1 - 80       Alor Setar  Kedah    05700
12  Alor Setar - Peti surat 161 & ke atas       Alor Setar  Kedah    05720
13       Alor Setar - Peti surat 81 - 160       Alor Setar  Kedah    05710
14                     Amanah Raya Berhad       Alor Setar  Kedah    05508
15                        Ambangan Height    Sungai Petani  Kedah    08000
16                          Ampangan Pedu     Kuala Nerang  Kedah    06300
17                             Anak Bukit       Alor Setar  Kedah    06550
18                       Anjung Pedu Lake     Kuala Nerang  Kedah    06300
19                                   Ason            Jitra  Kedah    06000

Upvotes: 1

Related Questions