Reputation: 81
Background:
I'm using Python 3.5 with Pandas and Jupyter Notebook. This is my first go at classes. Working with the Jupyter Notebook one can simply run small bits of code one cell at a time. I'd like to start making scripts/programs that have a logical and more readably flow. But there are basics that I just don't understand yet. Know that I've spend a lot of time the last few days reading and trying things to get this to work. I rarely ask questions on SO because I can usually get what I need from previous posts...like most people I'm sure.
For some reason I'm just not getting how to do what I'm sure is simple. Below is a snippet from a large program I'm writing. There are at this time four methods and they are duplicate the same bit of code. And that is to loop through the state_list in order to filter the states I want from the Pandas dataframe that I'm reading in. Each method's purpose is to read in a different file (xlsx and csv) and pull out data for a date and specific states.
Rather than repeating the for loop for in each method, can I make it a method and then just call it from the other methods? I tried a few things but it's just happening.
Current Code:
class GetData(object):
report_date = '3/1/2016'
state_list = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS',
'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']
def data_getter(self):
"""Pulls in dataset and filters on specific date and states."""
data = pd.read_excel('C:\\datapath\\file.xlsx')
data = data[data['date'] == GetData.report_date]
states = []
for state in GetData.state_list:
df = data[data['state'] == state]
states.append(df)
concat_data = pd.concat(states, axis=0)
return concat_data
Then I instantiate it like:
data = GetData()
dataset = data.data_getter()
Goal - soemthing like this?
class GetData(object):
report_date = '3/1/2016'
state_list = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS',
'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']
def data_getter(self):
"""Pulls in dataset and filters on specific date and states."""
data = pd.read_excel('C:\\datapath\\file.xlsx')
data = data[data['date'] == GetData.report_date]
# Call to state_filter here?
data = GetData()
data = data.state_filter
def state_filter(self):
states = []
for state in GetData.state_list:
df = data[data['state'] == state]
states.append(df)
concat_data = pd.concat(states, axis=0)
return concat_data
Upvotes: 1
Views: 106
Reputation: 210832
UPDATE:
well you can always write your own wrapper class, but i would say there must be a good reason for that...
class GetData(object):
#report_date = '3/1/2016'
states = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS',
'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']
def __init__(self, df_or_file=None, read_func=pd.read_excel, **kwargs):
if df_or_file is not None:
if isinstance(df_or_file, (pd.DataFrame, pd.Series, pd.Panel)):
self.data = df
elif(os.path.isfile(df_or_file)):
self.data = read_func(df_or_file, **kwargs)
else:
self.data = pd.DataFrame()
def save(self, filename, savefunc=pd.DataFrame.to_excel, **kwargs):
savefunc(df, filename, **kwargs)
now you can do the following things :
let's generate some random DF and prepare CSV and Excel files:
In [53]: df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))
In [54]: df
Out[54]:
a b c
0 6 0 2
1 8 1 5
2 5 5 4
3 0 4 1
4 5 4 2
In [55]: df.to_csv('d:/temp/test.csv', index=False)
In [56]: (df+100).to_excel('d:/temp/test.xlsx', index=False)
now we can create our object:
In [57]: x = GetData(df)
In [58]: x.data
Out[58]:
a b c
0 6 0 2
1 8 1 5
2 5 5 4
3 0 4 1
4 5 4 2
or load it from CSV
In [61]: x = GetData('d:/temp/test.csv', read_func=pd.read_csv, sep=',')
In [62]: x.data
Out[62]:
a b c
0 6 0 2
1 8 1 5
2 5 5 4
3 0 4 1
4 5 4 2
In [63]: x.data[x.data.a == 5]
Out[63]:
a b c
2 5 5 4
4 5 4 2
or load it from Excel file:
In [64]: x = GetData('d:/temp/test.xlsx')
In [65]: x.data
Out[65]:
a b c
0 106 100 102
1 108 101 105
2 105 105 104
3 100 104 101
4 105 104 102
and save it:
In [66]: x.data.c = 0
In [67]: x.data
Out[67]:
a b c
0 106 100 0
1 108 101 0
2 105 105 0
3 100 104 0
4 105 104 0
In [68]: x.save('d:/temp/new.xlsx', index=False)
In [69]: x.save('d:/temp/new.csv', savefunc=pd.DataFrame.to_csv, sep=';', index=False)
Upvotes: 1