LWT0419
LWT0419

Reputation: 9

Need help for web scraping and save it to excel using csv

I need to web scrape url and save it to excel like image I uploaded

but I don't no what is wrong with my code

I get only one row in my excel file. Help me plz.

import requests
from bs4 import BeautifulSoup
import csv


for i in range(10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find('table')
    rows=table.find('tbody').find_all('tr')

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
    theater = csv.writer(out)

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","a",newline='') as out:
    theater = csv.writer(out)
    theater.writerow(result)

Upvotes: 0

Views: 196

Answers (3)

Shijith
Shijith

Reputation: 4882

save the results to another list and write that list to csvfile.

import requests
from bs4 import BeautifulSoup
import csv

url='http://www.kobis.or.kr/kobis/business/mast/thea/findTheaterInfoList.do'
headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

data=[]
for i in range(1,10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find("table", class_="tbl_comm")
    rows=table.find('tbody').find_all('tr')
    for row in rows:
        result=[]
        for cell in row.find_all(['td', 'th']):
            result.append(cell.get_text())
        if result:
            data.append(result)

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv", 'w') as fp:
    writer = csv.writer(fp)
    writer.writerow(headers)
    writer.writerows(data)

Upvotes: 1

abhi1610
abhi1610

Reputation: 743

You can also use pandas for this purpose. You just have to do for result.

import pandas as pd
df = pd.DataFrame([result], columns=['City','District','Code','Name','NumScreen','NumSeats', 'Permanent', 'Registered', 'License','OpenDate','Run'])

df.to_csv('filename.csv', delimiter=',')

For CSV

You can use simply for result as it is only single row for data. If you would do listofresult for multiple entries can be handled.

listofresult = []
   for i in range(10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find('table')
    rows=table.find('tbody').find_all('tr')

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())
listofresult.append(result)

with open('filename.csv', 'w') as f:
    writer = csv.writer(f)
    # Write the headers
    headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']
    writer.writerow(headers)
    writer.writerows([result]) # Per current
    writer.writerows(listofresult) ## For multiple list

Upvotes: 0

Patrick Artner
Patrick Artner

Reputation: 51683

Your code only stores the last theater - it is a logical error. You need to store each theater result row in a list for all theaters and write that to the file:

# ... your code snipped fro brevity  ...

theaters = []  # collect all theaters here

for i in range(10):
    payload={'pageIndex':i}

    # ... snipp ...

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())

        theaters.append(result)

    # ... snipp ...

headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

# no need for 2 context's unless you have an existing file you want to delete
# every time you run your script
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
    theater = csv.writer(out)
    theater.writerow(headers)
    theater = csv.writer(out)
    theater.writerows(theaters)  # writerowS here

If you want to maybe append, else create look into Check a file exists or not without try-catch block and consider setting the opening mode as variable to 'w' or 'a' depending on wether the file is 'w' write the header, else only write data.


Addendum - you are not writing to excel, you are writing a CSV file that can be opened by excel. To directly write excel, use appropriate modules - like f.e. this here: https://openpyxl.readthedocs.io/en/stable/

HTH

Upvotes: 0

Related Questions