ag2019
ag2019

Reputation: 105

To make a .xls file from the extracted values from a website

I am an absolute beginner in Python Programming and that too Web Scraping. I was trying to scrape a website for practice purpose.

I have used the BeautifulSoup and Requests module.

The code is given below:

import requests
import xlwt
from bs4 import BeautifulSoup
from csv import writer

response=requests.get("https://www.wikipedia.org/")
wb=xlwt.Workbook()
ws=wb.add_sheet("Test")
soup=BeautifulSoup(response.content,"html.parser")
links=soup.find_all("strong")
for link in links:
    lang=link.get_text()
    for i in len(lang):
        ws.write(i,i,lang)
        wb.save("Wiki.xls")

I have scraped the headings from the web page but at the time of writing it to a excel file the following error is displayed.

File "C:/Users/laptop/PycharmProjects/myproject/srapingex1.py", line 16, in <module>
    for i in len(str(lang)):
TypeError: 'int' object is not iterable

The main problem is that the syntax of ws.write(row,column,data) requires the row address, column address and the data.

As I do not know the predefined size of the list, so how can be the row, column address be passed.

Please tell if I am doing the code incorrectly and kindly suggest if there is any way to write the extracted items onto a .xls file.

Upvotes: 1

Views: 76

Answers (1)

QHarr
QHarr

Reputation: 84465

I would consider using pandas and write to csv. You can preserve the language format nicely as well

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

res = requests.get('https://www.wikipedia.org/')
soup = bs(res.content, 'lxml')
items  = [item.text for item in soup.select('strong')][1:-1]
df = pd.DataFrame(items, columns = ['Languages']) 
df.to_csv(r'C:\Users\User\Desktop\Wiki.csv', sep=',', encoding='utf-8-sig',index = False )

You could write to xls with df.to_excel

df.to_excel(r"C:\Users\User\Desktop\Wiki.xls", sheet_name='MyData', index = False, header=False) 

Upvotes: 1

Related Questions