Reputation: 11451
I have the following code. What am I trying to do is screenscrape a website and then write the data to an excel worksheet. I can't read the existing data from excel file.
import xlwt
import xlrd
from xlutils.copy import copy
from datetime import datetime
import urllib.request
from bs4 import BeautifulSoup
import re
import time
import os
links= open('links.txt', encoding='utf-8')
#excel workbook
if os.path.isfile('./TestSheet.xls'):
rbook=xlrd.open_workbook('TestSheet.xls',formatting_info=True)
book=copy(rbook)
else:
book = xlwt.Workbook()
try:
book.add_sheet("wayanad")
except:
print("sheet exists")
sheet=book.get_sheet(1)
for line in links:
print("Currently Scanning\n","\n=================\n",line.rstrip())
url=str(line.rstrip())
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req)
soup = BeautifulSoup(html,"html.parser")
#print(soup.prettify())
title=soup.find('h1').get_text()
data=[]
for i in soup.find_all('p'):
data.append(i.get_text())
quick_descr=data[1].strip()
category=data[2].strip()
tags=data[3].strip()
owner=data[4].strip()
website=data[6].strip()
full_description=data[7]
address=re.sub('\s+', ' ', soup.find('h3').get_text()).strip()
city=soup.find(attrs={"itemprop": "addressRegion"}).get_text().strip()
postcode=soup.find(attrs={"itemprop": "postalCode"}).get_text().strip()
phone=[]
result=soup.findAll('h4')
for h in result:
if h.has_attr('itemprop'):
phone.append(re.sub("\D", "", h.get_text()))
#writing data to excel
row=sheet.last_used_row
column_count=sheet.ncols()
book.save("Testsheet.xls")
time.sleep(2)
The code explained
Screenshot of execl sheet structure
Currently the list is empty. But i want to continue from the last row.
I coudn't read data from the cell. The documentation says there is sheet.ncols
is avilable to count the columns. But it throws an error
>>>column_count=sheet.ncols()
>>>AttributeError: 'Worksheet' object has no attribute 'ncols'
What i wanted is a way to count rows and columns, and read the data from cell. Many turials are old. Now i am using python 3.4. I've already gone through this links and many other. But no luck
Upvotes: 0
Views: 1573
Reputation: 31
Is that what you are looking for ? Going through all col.?
xl_workbook = xlrd.open_workbook
num_cols = xl_sheet.ncols
for row_idx in range(0, xl_sheet.nrows):
Upvotes: 1