ding
ding

Reputation: 595

Parse HTML, 'ValueError: stat: path too long for Windows'

I'm trying to scrape data from NYSE's website, from this URL:

nyse = http://www1.nyse.com/about/listed/IPO_Index.html

Using requests, my I've set my request up like this:

page = requests.get(nyse)
soup = BeautifulSoup(page.text)

tables = soup.findAll('table')
test = pandas.io.html.read_html(str(tables))

However, I keep getting this error

'ValueError: stat: path too long for Windows'

I don't understand how to interpret this error, and furthermore, solve the problem. I've seen one other posting on this area (Copy a file with a too long path to another directory in Python) but I don't fully understand the workaround, and am not sure which path is the problem in this case.

The error is getting thrown at the test = pandas.io.... line but there isn't a clear definition of path, where I'm storing the table locally. Do I need to use pywin32? Why does this error only show for some URLs and not others? How do I solve this problem?

For reference, I'm using python 3.4

Update: The error only appears with the nyse website, and not for others that I'm also scraping. In all cases, I'm doing the str(tables) conversion.

Upvotes: 2

Views: 3518

Answers (1)

Mark Whitfield
Mark Whitfield

Reputation: 2520

The pandas read_html method accepts urls, files, or raw HTML strings as its first argument. It definitely looks like it's trying to interpret the str(tables) argument as a URL -- which would of course be quite long and overrun whatever limit Windows apparently has.

Are you certain that str(tables) produces raw, parseable HTML? Tables looks like it would be represented as a list of abstract node objects -- it seems likely that calling str() on this would not produce what you're looking for.

Upvotes: 0

Related Questions