Erol Can Akbaba
Erol Can Akbaba

Reputation: 387

Multiple input and multiple output function application to Pandas DataFrame raises shape exception

I have a dataframe with 6 columns (excluding the index), 2 of which are relevant inputs to a function and that function has two outputs. I'd like to insert these outputs to the original dataframe as columns.

I'm following toto_tico's answer here. I'm copying for convenience (with slight modifications):

    import pandas as pd
    df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [10, 10, 10], "D": [1, 1, 1]})
    def fab(row):                                                  
        return row['A'] * row['B'], row['A'] + row['B']
    df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))

This code works without a problem. My code, however, doesn't. My dataframe has the following structure:

        Date  Station  Insolation  Daily Total  Temperature(avg)  Latitude
0 2011-01-01  Aksaray         1.7      72927.6         -0.025000   38.3705
1 2011-01-02  Aksaray         5.6     145874.7          2.541667   38.3705
2 2011-01-03  Aksaray         6.3     147197.8          6.666667   38.3705
3 2011-01-04  Aksaray         2.9     100350.9          5.312500   38.3705
4 2011-01-05  Aksaray         0.7      42138.7          4.639130   38.3705

The function I'm applying takes a row as input, and returns two values based on Latitude and Date. Here's that function:

def h0(row):
    # Get a row from a dataframe, give back H0 and daylength
    # Leap year must be taken into account
    
    # row['Latitude'] and row['Date'] are relevant inputs
    
    # phi is taken in degrees, all angles are assumed to be degrees as well in formulas
    # numpy defaults to radians however...
    
    gsc = 1367
    phi = np.deg2rad(row['Latitude'])
    date = row['Date']
    
    year = pd.DatetimeIndex([date]).year[0]
    month = pd.DatetimeIndex([date]).month[0]
    day = pd.DatetimeIndex([date]).day[0]
    
    if year % 4 == 0:
        B = (day-1) * (360/366)
    else:
        B = (day-1) * (360/365)
    
    B = np.deg2rad(B)
    delta = (0.006918 - 0.399912*np.cos(B) + 0.070257*np.sin(B)
                           - 0.006758*np.cos(2*B) + 0.000907*np.sin(2*B)
                           - 0.002697*np.cos(3*B) + 0.00148*np.sin(3*B))
    
    ws = np.arccos(-np.tan(phi) * np.tan(delta))
    daylenght = (2/15) * np.rad2deg(ws)
    
    if year % 4 == 0:
        dayangle = np.deg2rad(360*day/366)
    else:
        dayangle = np.deg2rad(360*day/365)
    
    h0 = (24*3600*gsc/np.pi) * (1 + 0.033*np.cos(dayangle)) * (np.cos(phi)*np.cos(delta)*np.sin(ws) + 
                                                                     ws*np.sin(phi)*np.sin(delta))
    
    return h0, daylenght

When I use

ak['h0'], ak['N'] = zip(*ak.apply(h0, axis=1))

I get the error: Shape of passed values is (1816, 2), indices imply (1816, 6)

I'm unable to find what's wrong with my code. Can you help?

Upvotes: 0

Views: 1249

Answers (1)

Orenshi
Orenshi

Reputation: 1873

So as mentioned in my previous comment, if you'd like to create multiple NEW columns in the DataFrame based on multiple EXISTING columns of the DataFrame. You can create a new field in the row Series WITHIN your h0 function.

Here's an overly simple example to showcase what I mean:

>>> def simple_func(row):
...     row['new_column1'] = row.lat * 1000
...     row['year'] = row.date.year
...     row['month'] = row.date.month
...     row['day'] = row.date.day
...     return row
...
>>> df
        date   lat
0 2018-01-29  1000
1 2018-01-30  5000
>>> df.date
0   2018-01-29
1   2018-01-30
 Name: date, dtype: datetime64[ns]
>>> df.apply(simple_func, axis=1)
        date   lat  new_column1  year  month  day
0 2018-01-29  1000      1000000  2018      1   29
1 2018-01-30  5000      5000000  2018      1   30

In your case, inside your h0 function, setrow['h0'] = h0 and row['N'] = daylength then return row. Then when it comes to calling the function the DF your line changes to ak = ak.apply(h0, axis=1)

Upvotes: 1

Related Questions