Reputation: 285
I know very little about Python. But I was trying to achieve something in Extract, Transform and Load (ETL) using a small Python scrip. I get the desired result, but still want to understand this script.
from bs4 import BeautifulSoup
import urllib
import re
import string
import csv
urlHandle = urllib.urlopen("http://finance.yahoo.com/q/cp?s=^DJI")
html = urlHandle.read()
soup = BeautifulSoup(html)
table = soup.find('table', attrs = {
'id': 'yfncsumtab'
})
rows = table.findAll('tr')
a = ''
csvfile = open("F:/data/yahoofinance.csv", 'w')
for tr in rows[5: ]:
for td in tr.find_all('td', attrs = {
'class': 'yfnc_tabledata1'
}):
a += '"' + td.get_text() + '",'
a += '\n'
csvfile.write(a)
a = '
My questions are in this code, soup
is an object returned from BeautifulSoup(html)
function. Am I right? So in next statement I guess table
is also an object, so that means we are searching for a value in the soup
object using the find
function and that it's returning an object?
Please correct me on my information I have understood myself in the above code...
urlHandle
is a class, urllib
is what? and urlopen
is a static method.
html
is an object, urlhandle
is a class, read
is a method.
soup
is an object, BeautifulSoup(html)
is a function.
Please give your feedback on my understanding....and correct me where am wrong with your experienced words!
Upvotes: 0
Views: 64
Reputation: 2390
In Python basically everything is an Object!
when you use import
, you are including a certain module like urllib
.
things like soup = BeautifulSoup(html)
means that you create an instance (also an object) of a class BeautifulSoup module, that you initiate/construct passing the html object.
then things like soup.find(...
are functions that use the instance of a class to do a certain job. In this case get the first HTML table that has the attribute id with the value 'yfncsumtab'. it returs a Beautifult tag/obj.
Upvotes: 0
Reputation: 1358
soup
is an instance of the BeautifulSoup
.urlHandle
is again instance , urllib
is a module and urlopen
is a function belonging to this modulehtml
is object and read
is a method which is executed.There is a way you can find out them yourself using the type()
function.
Upvotes: 1
Reputation: 13747
To be technical, I think it's important to understand that EVERYTHING in Python is an object. So, classes are objects, functions are objects, everything is an object.
That being said, we make distinctions after that, such as "function", "class", etc.
urllib, in particular, is something we call a module.
Upvotes: 1
Reputation: 2623
urlHandle
is an object, urllib
is a module and urlopen
is a
functionhtml
is an object and read
is a methodsoup
is an object and BeatifulSoup(html)
is the constructor for a BeautifulSoup
objectIt can be quite confusing, but in general you can keep in mind that CamelCased names are classes, which makes CamelCase() the constructor. What you import is a module, which can contain classes and/or functions.
Upvotes: 1