Reputation: 9
I need to web scrape url and save it to excel like image I uploaded
but I don't no what is wrong with my code
I get only one row in my excel file. Help me plz.
import requests
from bs4 import BeautifulSoup
import csv
for i in range(10):
payload={'pageIndex':i}
r=requests.post(url, params=payload)
soup=BeautifulSoup(r.text, 'html.parser')
table=soup.find('table')
rows=table.find('tbody').find_all('tr')
for j in range(len(rows)):
col=rows[j].find_all('td')
result=[]
for item in col:
result.append(item.get_text())
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
theater = csv.writer(out)
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","a",newline='') as out:
theater = csv.writer(out)
theater.writerow(result)
Upvotes: 0
Views: 196
Reputation: 4882
save the results
to another list and write that list to csvfile.
import requests
from bs4 import BeautifulSoup
import csv
url='http://www.kobis.or.kr/kobis/business/mast/thea/findTheaterInfoList.do'
headers = ['City','District','Code','Name','NumScreen','NumSeats',
'Permanent', 'Registered', 'License','OpenDate','Run']
data=[]
for i in range(1,10):
payload={'pageIndex':i}
r=requests.post(url, params=payload)
soup=BeautifulSoup(r.text, 'html.parser')
table=soup.find("table", class_="tbl_comm")
rows=table.find('tbody').find_all('tr')
for row in rows:
result=[]
for cell in row.find_all(['td', 'th']):
result.append(cell.get_text())
if result:
data.append(result)
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv", 'w') as fp:
writer = csv.writer(fp)
writer.writerow(headers)
writer.writerows(data)
Upvotes: 1
Reputation: 743
You can also use pandas
for this purpose. You just have to do for result
.
import pandas as pd
df = pd.DataFrame([result], columns=['City','District','Code','Name','NumScreen','NumSeats', 'Permanent', 'Registered', 'License','OpenDate','Run'])
df.to_csv('filename.csv', delimiter=',')
For CSV
You can use simply for result
as it is only single row for data. If you would do listofresult
for multiple entries can be handled.
listofresult = []
for i in range(10):
payload={'pageIndex':i}
r=requests.post(url, params=payload)
soup=BeautifulSoup(r.text, 'html.parser')
table=soup.find('table')
rows=table.find('tbody').find_all('tr')
for j in range(len(rows)):
col=rows[j].find_all('td')
result=[]
for item in col:
result.append(item.get_text())
listofresult.append(result)
with open('filename.csv', 'w') as f:
writer = csv.writer(f)
# Write the headers
headers = ['City','District','Code','Name','NumScreen','NumSeats',
'Permanent', 'Registered', 'License','OpenDate','Run']
writer.writerow(headers)
writer.writerows([result]) # Per current
writer.writerows(listofresult) ## For multiple list
Upvotes: 0
Reputation: 51683
Your code only stores the last theater - it is a logical error. You need to store each theater result
row in a list for all theaters
and write that to the file:
# ... your code snipped fro brevity ...
theaters = [] # collect all theaters here
for i in range(10):
payload={'pageIndex':i}
# ... snipp ...
for j in range(len(rows)):
col=rows[j].find_all('td')
result=[]
for item in col:
result.append(item.get_text())
theaters.append(result)
# ... snipp ...
headers = ['City','District','Code','Name','NumScreen','NumSeats',
'Permanent', 'Registered', 'License','OpenDate','Run']
# no need for 2 context's unless you have an existing file you want to delete
# every time you run your script
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
theater = csv.writer(out)
theater.writerow(headers)
theater = csv.writer(out)
theater.writerows(theaters) # writerowS here
If you want to maybe append, else create look into Check a file exists or not without try-catch block and consider setting the opening mode as variable to 'w'
or 'a'
depending on wether the file is 'w'
write the header, else only write data.
Addendum - you are not writing to excel, you are writing a CSV file that can be opened by excel. To directly write excel, use appropriate modules - like f.e. this here: https://openpyxl.readthedocs.io/en/stable/
HTH
Upvotes: 0