Mike Lukos
Mike Lukos

Reputation: 61

Extracting rows until a certain a particular row - Pandas Dataframe

I have a function which imports data of one government bond at one date from a CSV file containing multiple gov bonds with of different ranges of maturities:

def importdata (fileloc, date, name):
    'Imports data from a given location, date and name'
    data = pd.read_csv(fileloc) # file location 
    date = date 
    result = data[ (data['date']) == date] # getting the date of the bond
    data =  result.loc[:, result.columns.str.startswith(name)] # getting the curve wanted at the date
    data = data.T # Transposing the data
    data = data.reset_index()
    data.columns = ['maturity','spot rate'] # renaming columns
    data['maturity'] = data.maturity.str.rsplit(n=1).str[-1]
    
    return data

Example of data:

   maturity  spot rate
0        1Y      0.081
1       18M      0.164
2        2Y      0.230
3        3Y      0.361
4        4Y      0.479
5        5Y      0.577
6        6Y      0.660
7        7Y      0.732
8        8Y      0.796
9        9Y      0.851
10      10Y      0.900
11      12Y      0.967
12      15Y      1.026
13      20Y      1.044
14      25Y      1.042
15      30Y      1.020

I have added a line of code where it extracts the rows of the dataframe up until a maximum maturity that I will give as an input to the function:

data.iloc[:data.loc[data.maturity.str.contains(max_maturity,na=False)].index[0]]

So now the function looks like this:

def importdata (fileloc, date, name, max_maturity):
    'Imports data from a given location, date and name'
    data = pd.read_csv(fileloc) # file location 
    date = date 
    result = data[ (data['date']) == date] # getting the date of the curve
    data =  result.loc[:, result.columns.str.startswith(name)] # getting the curve wanted at the date
    data = data.T # Transposing the data
    data = data.reset_index()
    data.columns = ['maturity','spot rate'] # renaming columns
    data['maturity'] = data.maturity.str.rsplit(n=1).str[-1]
    
    data = data.iloc[:data.loc[data.maturity.str.contains(max_maturity,na=False)].index[0]]
    
    return data

The only problem is that now with that additional line of code, I can no longer import the full data. Is there a way I can alter the code to allow me to do so, whilst still being able to import only up to a specific maturity if I want?

Upvotes: 0

Views: 273

Answers (1)

Keine_Eule
Keine_Eule

Reputation: 147

You could set the default of max_maturity to None and add an if statement:

def importdata (fileloc, date, name, max_maturity=None):
    'Imports data from a given location, date and name'
    data = pd.read_csv(fileloc) # file location 
    # date = date   this does nothing
    result = data[ (data['date']) == date] # getting the date of the curve
    data =  result.loc[:, result.columns.str.startswith(name)] # getting the curve wanted at the date
    data = data.T # Transposing the data
    data = data.reset_index()
    data.columns = ['maturity','spot rate'] # renaming columns
    data['maturity'] = data.maturity.str.rsplit(n=1).str[-1]
    
    if max_maturity:
        data = data.iloc[:data.loc[data.maturity.str.contains(max_maturity,na=False)].index[0]]
    
    return data

Upvotes: 1

Related Questions