yasin mohammed
yasin mohammed

Reputation: 471

Assitance needed in python pandas to reduce lines of code and cycle time

I have a DF where I am calculating the filling the emi value in fields

account Total   Start Date  End Date    EMI
211829  107000  05/19/17    01/22/19    5350
320563  175000  08/04/17    10/30/18    12500
648336  246000  02/26/17    08/25/19    8482.7586206897
109996  175000  11/23/17    11/27/19    7291.6666666667
121213  317000  09/07/17    04/12/18    45285.7142857143

Then based on dates range I create new fields like Jan 17 , Feb 17 , Mar 17 etc. and fill them up with the code below.

jant17 = pd.to_datetime('2017-01-01')
febt17 = pd.to_datetime('2017-02-01')
mart17 = pd.to_datetime('2017-03-01')

jan17 = pd.to_datetime('2017-01-31')
feb17 = pd.to_datetime('2017-02-28')
mar17 = pd.to_datetime('2017-03-31')

df.ix[(df['Start Date'] <= jan17) & (df['End Date'] >= jant17) , 'Jan17'] = df['EMI']

But the drawback is when I have to do a forecast till 2019 or 2020 They become too many lines of code to write and when there is any update I need to modify too many lines of code. To reduce the lines of code I tried an alternate method with using for loop but the code started taking very long to execute.

monthend = { 'Jan17' : pd.to_datetime('2017-01-31'),
            'Feb17' : pd.to_datetime('2017-02-28'),
            'Mar17' : pd.to_datetime('2017-03-31')}

monthbeg = { 'Jant17' : pd.to_datetime('2017-01-01'),
            'Febt17' : pd.to_datetime('2017-02-01'),
            'Mart17' : pd.to_datetime('2017-03-01')}

for mend in monthend.values():
    for mbeg in monthbeg.values():
        for coln in colnames:
            df.ix[(df['Start Date'] <= mend) & (df['End Date'] >= mbeg) , coln] = df['EMI']

This greatly reduced the no of lines of code but increased to execution time from 3-4 mins to 1 hour plus. Is there a better way to code this with less lines and lesser processing time

Upvotes: 1

Views: 53

Answers (1)

jezrael
jezrael

Reputation: 862911

I think you can create helper df with start, end dates and names of columns, loop rows and create new columns of original df:

dates = pd.DataFrame({'start':pd.date_range('2017-01-01', freq='MS', periods=10),
                      'end':pd.date_range('2017-01-01', freq='M', periods=10)})
dates['names'] = dates.start.dt.strftime('%b%y')
print (dates)
         end      start  names
0 2017-01-31 2017-01-01  Jan17
1 2017-02-28 2017-02-01  Feb17
2 2017-03-31 2017-03-01  Mar17
3 2017-04-30 2017-04-01  Apr17
4 2017-05-31 2017-05-01  May17
5 2017-06-30 2017-06-01  Jun17
6 2017-07-31 2017-07-01  Jul17
7 2017-08-31 2017-08-01  Aug17
8 2017-09-30 2017-09-01  Sep17
9 2017-10-31 2017-10-01  Oct17

#if necessary convert to datetimes
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])

def f(x):
    df.loc[(df['Start Date'] <= x.start) & (df['End Date'] >= x.end) , x.names] = df['EMI']
dates.apply(f, axis=1)
print (df)
   account   Total Start Date   End Date           EMI  Jan17  Feb17  \
0   211829  107000 2017-05-19 2019-01-22   5350.000000    NaN    NaN   
1   320563  175000 2017-08-04 2018-10-30  12500.000000    NaN    NaN   
2   648336  246000 2017-02-26 2019-08-25   8482.758621    NaN    NaN   
3   109996  175000 2017-11-23 2019-11-27   7291.666667    NaN    NaN   
4   121213  317000 2017-09-07 2018-04-12  45285.714286    NaN    NaN   

         Mar17        Apr17        May17        Jun17        Jul17  \
0          NaN          NaN          NaN  5350.000000  5350.000000   
1          NaN          NaN          NaN          NaN          NaN   
2  8482.758621  8482.758621  8482.758621  8482.758621  8482.758621   
3          NaN          NaN          NaN          NaN          NaN   
4          NaN          NaN          NaN          NaN          NaN   

         Aug17         Sep17         Oct17  
0  5350.000000   5350.000000   5350.000000  
1          NaN  12500.000000  12500.000000  
2  8482.758621   8482.758621   8482.758621  
3          NaN           NaN           NaN  
4          NaN           NaN  45285.714286  

Upvotes: 3

Related Questions