Ryan Francis
Ryan Francis

Reputation: 35

Pandas Apply function not working consistently (Python 3)

Summary

Procedure: I have three functions. Function A, B, and C. Function A uses apply() to apply function B and C to a global Pandas DataFrame.

Problem: Inspecting the results shows that only Function B was applied to the global dataframe

Other notes: If I apply Function C from the python interpreter, then it works.


The long version

The three main functions in this problem are:

load_paypal(): Loads data into a gobal Pandas DataFrame and applies the other two functions on a couple columns.

read_cash(): reads in the value, strips dollar signs, commas etc and returns a number

read_date(): reads a string and returns a datetime.

The problem I'm having is that when I use apply() to apply read_cash, it appears to work but read_date doesn't. Additionally, when I use the read_date function with apply from the python interpreter, with the exact same code, I get the expected results, ie it works.

Functions

load_paypal

def load_paypal():
    global paypal_data
    paypal_data = pd.DataFrame( pd.read_csv(open("Download.csv") ) )
    paypal_data = paypal_data.fillna(0)
    cash_names = ('Gross', 'Fee', 'Net', 'Shipping and Handling Amount', 'Sales Tax', 'Balance')

    for names in cash_names:
        paypal_data[names].apply( ryan_tools.read_cash )

    paypal_data = paypal_data.rename(columns = { paypal_data.columns[0] : 'Date'})

    paypal_data['Date'].apply( ryan_tools.read_date )
    print( paypal_data['Date'] ) # The 'Date' datatype is still a string here
    print( paypal_data['Net'] ) # The 'Net' datatype is proven to be converted
    # to a number over here( It definitely starts out as a string )
    return

ryan_tools.read_date

def read_date(text):
    for fmt in ( '%m/%d/%y' , '%M/%D/%y' , '%m/%d/%Y', '%Y/%m/%d', '%Y/%M/%D', 'Report Date :%m/%d/%Y', '%Y%M%D' , '%Y%m%d' ):
        try:
            return datetime.datetime.strptime(text, fmt)
        except ValueError:
            pass
    raise ValueError('No Valid Date found')

ryan_tools.read_cash

def read_cash(text):
    text = str(text)
    if text == '':
        return 0
    temp = text.replace(' ', '')
    temp = text.replace(',', '')
    temp = temp.replace('$', '')

    if ('(' in temp or ')' in temp):
        temp = temp.replace('(', '')
        temp = temp.replace(')', '')
        ans = float(temp) * -1.0
        return ans
    ans = round(float(temp),2)

    return ans

Notes: ryan_tools is just my general file of commonly used useful functions

Upvotes: 3

Views: 10718

Answers (1)

Randy
Randy

Reputation: 14857

.apply() is not an in-place operation(i.e., it returns a new object rather than modifying the original):

In [3]: df = pd.DataFrame(np.arange(10).reshape(2,5))

In [4]: df
Out[4]:
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9

In [5]: df[4].apply(lambda x: x+100)
Out[5]:
0    104
1    109
Name: 4, dtype: int64

In [6]: df
Out[6]:
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9

What you probably want is to reassign the column to the new one created by your .apply():

paypal_data['Date'] = paypal_data['Date'].apply(ryan_tools.read_date)

Upvotes: 13

Related Questions