2ndg33r
2ndg33r

Reputation: 81

Python 3.5 - Pandas - Call a method with a for loop from another method

Background:

I'm using Python 3.5 with Pandas and Jupyter Notebook. This is my first go at classes. Working with the Jupyter Notebook one can simply run small bits of code one cell at a time. I'd like to start making scripts/programs that have a logical and more readably flow. But there are basics that I just don't understand yet. Know that I've spend a lot of time the last few days reading and trying things to get this to work. I rarely ask questions on SO because I can usually get what I need from previous posts...like most people I'm sure.

For some reason I'm just not getting how to do what I'm sure is simple. Below is a snippet from a large program I'm writing. There are at this time four methods and they are duplicate the same bit of code. And that is to loop through the state_list in order to filter the states I want from the Pandas dataframe that I'm reading in. Each method's purpose is to read in a different file (xlsx and csv) and pull out data for a date and specific states.

Rather than repeating the for loop for in each method, can I make it a method and then just call it from the other methods? I tried a few things but it's just happening.

Current Code:

class GetData(object):

    report_date = '3/1/2016'
    state_list = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS', 
                  'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']


def data_getter(self):
    """Pulls in dataset and filters on specific date and states."""

    data = pd.read_excel('C:\\datapath\\file.xlsx')
    data = data[data['date'] == GetData.report_date]

    states = []
    for state in GetData.state_list:
        df = data[data['state'] == state]
        states.append(df)
    concat_data = pd.concat(states, axis=0)
    return concat_data

Then I instantiate it like:

data = GetData()
dataset = data.data_getter()

Goal - soemthing like this?

class GetData(object):

    report_date = '3/1/2016'
    state_list = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS', 
                  'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']


    def data_getter(self):
        """Pulls in dataset and filters on specific date and states."""

        data = pd.read_excel('C:\\datapath\\file.xlsx')
        data = data[data['date'] == GetData.report_date]

        # Call to state_filter here?

        data = GetData()
        data = data.state_filter

    def state_filter(self):
        states = []
        for state in GetData.state_list:
            df = data[data['state'] == state]
            states.append(df)
        concat_data = pd.concat(states, axis=0)
        return concat_data

Upvotes: 1

Views: 106

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

UPDATE:

well you can always write your own wrapper class, but i would say there must be a good reason for that...

class GetData(object):

    #report_date = '3/1/2016'
    states = ['AL', 'AZ', 'GA', 'IA', 'ID', 'IL', 'MN', 'MS', 
              'MT', 'NE', 'NM', 'NV', 'TN', 'UT', 'WI']

    def __init__(self, df_or_file=None, read_func=pd.read_excel, **kwargs):
        if df_or_file is not None:
            if isinstance(df_or_file, (pd.DataFrame, pd.Series, pd.Panel)):
                self.data = df
            elif(os.path.isfile(df_or_file)):
                self.data = read_func(df_or_file, **kwargs)
        else:
            self.data = pd.DataFrame()

    def save(self, filename, savefunc=pd.DataFrame.to_excel, **kwargs):
        savefunc(df, filename, **kwargs)

now you can do the following things :

let's generate some random DF and prepare CSV and Excel files:

In [53]: df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))

In [54]: df
Out[54]:
   a  b  c
0  6  0  2
1  8  1  5
2  5  5  4
3  0  4  1
4  5  4  2

In [55]: df.to_csv('d:/temp/test.csv', index=False)

In [56]: (df+100).to_excel('d:/temp/test.xlsx', index=False)

now we can create our object:

In [57]: x = GetData(df)

In [58]: x.data
Out[58]:
   a  b  c
0  6  0  2
1  8  1  5
2  5  5  4
3  0  4  1
4  5  4  2

or load it from CSV

In [61]: x = GetData('d:/temp/test.csv', read_func=pd.read_csv, sep=',')

In [62]: x.data
Out[62]:
   a  b  c
0  6  0  2
1  8  1  5
2  5  5  4
3  0  4  1
4  5  4  2

In [63]: x.data[x.data.a == 5]
Out[63]:
   a  b  c
2  5  5  4
4  5  4  2

or load it from Excel file:

In [64]: x = GetData('d:/temp/test.xlsx')

In [65]: x.data
Out[65]:
     a    b    c
0  106  100  102
1  108  101  105
2  105  105  104
3  100  104  101
4  105  104  102

and save it:

In [66]: x.data.c = 0

In [67]: x.data
Out[67]:
     a    b  c
0  106  100  0
1  108  101  0
2  105  105  0
3  100  104  0
4  105  104  0

In [68]: x.save('d:/temp/new.xlsx', index=False)

In [69]: x.save('d:/temp/new.csv', savefunc=pd.DataFrame.to_csv, sep=';', index=False)

Upvotes: 1

Related Questions