Reputation: 105
I am an absolute beginner in Python Programming and that too Web Scraping. I was trying to scrape a website for practice purpose.
I have used the BeautifulSoup and Requests module.
The code is given below:
import requests
import xlwt
from bs4 import BeautifulSoup
from csv import writer
response=requests.get("https://www.wikipedia.org/")
wb=xlwt.Workbook()
ws=wb.add_sheet("Test")
soup=BeautifulSoup(response.content,"html.parser")
links=soup.find_all("strong")
for link in links:
lang=link.get_text()
for i in len(lang):
ws.write(i,i,lang)
wb.save("Wiki.xls")
I have scraped the headings from the web page but at the time of writing it to a excel file the following error is displayed.
File "C:/Users/laptop/PycharmProjects/myproject/srapingex1.py", line 16, in <module>
for i in len(str(lang)):
TypeError: 'int' object is not iterable
The main problem is that the syntax of ws.write(row,column,data)
requires the row address, column address and the data.
As I do not know the predefined size of the list, so how can be the row, column address be passed.
Please tell if I am doing the code incorrectly and kindly suggest if there is any way to write the extracted items onto a .xls file.
Upvotes: 1
Views: 76
Reputation: 84465
I would consider using pandas and write to csv. You can preserve the language format nicely as well
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
res = requests.get('https://www.wikipedia.org/')
soup = bs(res.content, 'lxml')
items = [item.text for item in soup.select('strong')][1:-1]
df = pd.DataFrame(items, columns = ['Languages'])
df.to_csv(r'C:\Users\User\Desktop\Wiki.csv', sep=',', encoding='utf-8-sig',index = False )
You could write to xls with df.to_excel
df.to_excel(r"C:\Users\User\Desktop\Wiki.xls", sheet_name='MyData', index = False, header=False)
Upvotes: 1