Reputation: 20302

Trying to loop through URLs and save contents, as a data frame, to text file

I think this block of code is pretty close to being right, but something is throwing it off. I'm trying to loop through 10 URLs and download the contents of each to a text file, and make sure everything is structured orderly, in a dataframe.

import pandas as pd
rawHtml = ''
url = r'http://www.pga.com/golf-courses/search?page=" + i + "&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0'
g = open("C:/Users/rshuell001/Desktop/MyData.txt", "w")
for i in range(0, 10):      
    df = pd.DataFrame.from_csv(url)
    print(df)
    g.write(str(df))
    g.close()

The error that I get says:

CParserError: Error tokenizing data.
C error: Expected 1 fields in line 22, saw 2

I have no idea what that means. I only have 9 lines of code, so I don't know why it's mentioning a problem on line 22.

Can someone give me a push to get this working?

Upvotes: 0

Answers (2)

ASH

Reputation: 20302

I finally got it working. This is what I was trying to do all along.

import requests
from bs4 import BeautifulSoup

link = "http://www.pga.com/golf-courses/search?page=1&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
html = requests.get(link).text

soup = BeautifulSoup(html, "lxml")
res = soup.findAll("div", {"class": "views-field-nothing"})
for r in res:
    print("Address: " + r.find("span", {'class': 'field-content'}).text)

Upvotes: 1

Dun Peal

Reputation: 17679

pandas.DataFrame.from_csv() takes a first argument which is either a path or a file-like handle, where either are supposed to be pointing at valid CSV file.

You are providing it with a URL.

It seems that you want to use a different function: the top-level pandas.read_csv. This function will actually fetch the data from you from a valid URL, then parse it.

If for any reason you insist on using pandas.DataFrame.from_csv(), you will have to:

Get the text from the page.
Persist the text, or parts thereof, as a valid CSV file, or a file-like object.
Provide the path to the file, or the handler of the file-like, as the first argument to pandas.DataFrame.from_csv().

Upvotes: 1

Trying to loop through URLs and save contents, as a data frame, to text file

Answers (2)

Related Questions