Reputation: 109
I'm finally creating a class to analyse my data in a more streamlined way. It takes a CSV file and outputs some information about the table and its columns.
class Analyses:
def Types_des_colonnes(self, df):
tcol = df.columns.to_series().groupby(df.dtypes).groups
tycol = {k.name: v for k, v in tcol.items()}
return(self.tycol)
def Analyse_table(self, table):
# Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
Types_des_colonnes(table)
nbr_types_colonnes_diff=len(tycol.keys())
type_table = table.dtypes
liste_columns = table.columns
clef_types= tycol.keys()
long_table = len(table)
nbr_cols = len(liste_columns)
print(table.describe())
print('Nombre de colonnes: '+ str(nbr_cols))
print('Nombre de types de colonnes différentes: '+str(nbr_types_colonnes_diff))
for kk in range(0,nbr_types_colonnes_diff):
print('Type: ' + tycol.keys()[kk])
print(tycol.values())
return(liste_columns)
def Analyse_colonne(self, col):
from numpy import where, nan
from pandas import isnull,core,DataFrame
# Si col est un dataframe:
if type(col) == core.frame.DataFrame:
dict_col = {}
for co in col.columns:
dict_col_Loc = Analyse_colonne(col[co]);
dict_col[co] = dict_col_Loc.values()
return(dict_col)
elif type(col) == core.series.Series:
type_col = type(col)
arr_null = where(isnull(col))[0]
type_data = col.dtype
col_uniq = col.unique()
nbr_unique= len(col_uniq)
taille_col= len(col)
nbr_ligne_vide= len(arr_null)
top_entree= col.head()
bottom_entree= col.tail()
pct_uniq= (float(nbr_unique)/float(taille_col))*100.0
pct_ligne_vide= (float(nbr_ligne_vide)/float(taille_col))*100.0
print('\n')
print(' ################# '+col.name+' #################')
print('Type des données: ' + str(type_data))
print('Taille de la colonne: ' + str(taille_col))
if nbr_unique == 1:
print('Aucune entrée unique')
else:
print('Nombre d\'uniques: '+ str(nbr_unique))
print('Pourcentage d\'uniques: '+str(pct_uniq)+' %')
if nbr_ligne_vide == 0:
print('Aucune ligne vide')
else:
print('Nombre de lignes vides: '+ str(nbr_ligne_vide))
print('Pourcentage de lignes vides: '+str(pct_ligne_vide)+' %')
dict_col = {}
dict_col[col.name] = arr_null
return(dict_col)
else:
print('Problem')
def main():
anly = Analyses()
anly.Analyse_table(df_AIS)
if __name__ == '__main__':
main()
When I run this script, I get a:
NameError: name 'tycol' is not defined
Which refers to the second line of:
def Analyse_table():
# Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
Types_des_colonnes(table)
nbr_types_colonnes_diff=len(tycol.keys())
I know it has to do with using the 'self' properly, but I really don't understand how to do so properly. Could anybody show me how to solve this very easy problem?
(All the 'self' present in this script have been added by me only to try to make it work on my own.)
Upvotes: 0
Views: 771
Reputation: 62531
The members of a Python object are distinguished from other variables by being on the right hand side of .
(as in obj.member
)
The first parameter of a method is bound to the object on which the method is called. By convention, this parameter is named self
, this is not a technical requirement.
tycol
is a normal variable, entirely unassociated with the Analyses
object. self.tycol
is a different name.
Notice how you return self.tycol
from Types_des_colonnes
, without giving it any value (which should raise an AttributeError
. Have you tried running the code as you posted it in the question body?). You then discard this value at the call site.
You should either assign the result of Types_des_colonnes
to a name in Analyse_table
, or exclusively use the name self.tycol
.
def Types_des_colonnes(self, df):
tcol = df.columns.to_series().groupby(df.dtypes).groups
# we don't care about tcol after this, it ceases to exist when the method ends
self.tycol = {k.name: v for k, v in tcol.items()}
# but we do care about self.tycol
def Analyse_table(self, table):
# Renvoi un dico 'tycol' avec les types en clef et les noms des colonnes en valeur:
Types_des_colonnes(table)
nbr_types_colonnes_diff = len(self.tycol.keys())
# ...
Upvotes: 2
Reputation:
A class is a data structure that contains "data and the methods that operate on that data". Note, that I did not say 'functions' because a class always has access to data contained within the class, and therefore the methods in the class are not 'functions' in a mathematical sense. But, That's for another day, perhaps.
So, when do you use self
? self
represents the actual instance of the class that you are invoking the method within. So if you have a class called Shape
and two instances of Shape
a
and b
then when you call a.area()
the self
object inside the area
method refers to the instance of Shape
named a
, where when you invoke b.area()
the self
object refers to the b
instance of Shape
In this way you can write a method that works for any instance of Shape
. To make this more concrete, here's an example Shape
class:
class Shape():
def __init__(self, length_in, height_in):
self.length = length_in
self.height = height_in
def area(self):
return self.length * self.height
Here you can see that the data contained within the Shape
class is length and height. Those values are assigned at the __init__
(in the constructor, ie. Shape a(3.0,4.0)
) And are assigned as members of self
. Then, afterword they can be accessed by the method area
though the self
object, for calculations.
These members can also be reassigned, and new members can be created. (Generally though members are only created in the constructor).
This is all very weird compared to the other simple aspects of Python design. Yet, this is not unique to Python. In C++ there is a this
pointer, that serves the same purpose, and in JavaScript the way that closures are used to create objects often uses a this
variable to perform the same task as Python's self
.
I hope this helps a little. I can expand on any other questions you have.
Also, it's generally a good idea to do import
statements at the top of the file. There are reasons not to, but none of them are good enough for normal coders to use.
Upvotes: 0
Reputation: 521
In method Types_de_colonnes
, you need to do: self.tycol=tycol
. Also, you need to call the method "as a method". Take a week to read a book about python to learn some basics. Programming is easy, but not that easy :)
Upvotes: 0