Kate
Kate

Reputation: 285

Understanding class methods in Python code

I know very little about Python. But I was trying to achieve something in Extract, Transform and Load (ETL) using a small Python scrip. I get the desired result, but still want to understand this script.

from bs4 import BeautifulSoup
import urllib
import re
import string
import csv
urlHandle = urllib.urlopen("http://finance.yahoo.com/q/cp?s=^DJI")
html = urlHandle.read()
soup = BeautifulSoup(html)
table = soup.find('table', attrs = {
    'id': 'yfncsumtab'
})
rows = table.findAll('tr')

a = ''
csvfile = open("F:/data/yahoofinance.csv", 'w')
for tr in rows[5: ]:
    for td in tr.find_all('td', attrs = {
        'class': 'yfnc_tabledata1'
    }):
    a += '"' + td.get_text() + '",'
a += '\n'
csvfile.write(a)
a = '

My questions are in this code, soup is an object returned from BeautifulSoup(html) function. Am I right? So in next statement I guess table is also an object, so that means we are searching for a value in the soup object using the find function and that it's returning an object?

Please correct me on my information I have understood myself in the above code...

  1. urlHandle is a class, urllib is what? and urlopen is a static method.

  2. html is an object, urlhandle is a class, read is a method.

  3. soup is an object, BeautifulSoup(html) is a function.

Please give your feedback on my understanding....and correct me where am wrong with your experienced words!

Upvotes: 0

Views: 64

Answers (4)

Alg_D
Alg_D

Reputation: 2390

In Python basically everything is an Object! when you use import, you are including a certain module like urllib.

things like soup = BeautifulSoup(html) means that you create an instance (also an object) of a class BeautifulSoup module, that you initiate/construct passing the html object.

then things like soup.find(... are functions that use the instance of a class to do a certain job. In this case get the first HTML table that has the attribute id with the value 'yfncsumtab'. it returs a Beautifult tag/obj.

Upvotes: 0

formatkaka
formatkaka

Reputation: 1358

  1. soup is an instance of the BeautifulSoup .
  2. urlHandle is again instance , urllib is a module and urlopen is a function belonging to this module
  3. html is object and read is a method which is executed.

There is a way you can find out them yourself using the type() function.

Upvotes: 1

Matt Messersmith
Matt Messersmith

Reputation: 13747

To be technical, I think it's important to understand that EVERYTHING in Python is an object. So, classes are objects, functions are objects, everything is an object.

That being said, we make distinctions after that, such as "function", "class", etc.

urllib, in particular, is something we call a module.

Upvotes: 1

Jesse Bakker
Jesse Bakker

Reputation: 2623

  1. urlHandle is an object, urllib is a module and urlopen is a function
  2. html is an object and read is a method
  3. soup is an object and BeatifulSoup(html) is the constructor for a BeautifulSoup object

It can be quite confusing, but in general you can keep in mind that CamelCased names are classes, which makes CamelCase() the constructor. What you import is a module, which can contain classes and/or functions.

Upvotes: 1

Related Questions