user11322408
user11322408

Reputation:

exceeding number of urls when writing to excel file

I got this warning WARNING: C:\Users\diodi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\xlsxwriter\worksheet.py:923: UserWarning: Ignoring URL 'https://www.google.com/search?q=hello+world' since it exceeds Excel's limit of 65,530 URLS per worksheet. "65,530 URLS per worksheet." % force_unicode(url))

I use to write th output of scraped data

#spider.found_items is a list of dictionary
df = pd.DataFrame(spider.found_items)[ordered_list] #ordered_list is the order of dictiornary outputed
df.to_excel("{}.xlsx".format(file_name),sheet_name='All Products')

I checked this Number of URLS is over Excel's limit of 65,530 URLS per worksheet. but this formats the links as strings (not clickable). is there a way to keep URL (as a link) if I can write to multiple sheets or any other suggestion?

Upvotes: 4

Views: 2927

Answers (2)

Seeking Guidance
Seeking Guidance

Reputation: 143

If possible, use the EXCEL HYPERLINK function instead. You should not encounter this limitation with the HYPERLINK function.

Upvotes: 1

jmcnamara
jmcnamara

Reputation: 41644

Fundamentally this is an Excel limitation. It only allows 65,530 unique* urls per worksheet. There isn't any workaround for that.

(*) Excel does allow more than 65,530 non-unique urls if they are grouped contiguously. For example the same url in cells A1 to A100 would only be counted as one url against the 65,530 limit. However, grouping of urls to take advantage of this isn't supported by XlsxWriter (and even if it was it doesn't help if all the urls are unique or non-contiguous).

Upvotes: 2

Related Questions