Reputation: 61

Getting subset of dataframe in Pandas

I have a dataframe data:

    maturity    spot rate
0   1Y  0.182
1   2Y  0.20199999999999999
2   3Y  0.284
3   4Y  0.426
4   5Y  0.585
5   6Y  0.745
6   7Y  0.892
7   8Y  1.021
8   9Y  1.13
9   10Y 1.224
10  12Y 1.375
11  15Y 1.5219999999999998
12  20Y 1.653
13  25Y 1.7109999999999999
14  30Y 1.739

I have a line of code which allows me to extract up until a certain maturity (max maturity is an input I give):

data = data.iloc[:data.loc[data.maturity.str.contains(max_maturity,na=False)].index[0]]

However the problem is that say if I want to get up to the 20Y and put max_maturity as 20Y, it will only get up to 15Y. Is there a way where it will extract all the rows up to and including the 20Y row?

Upvotes: 2

Answers (3)

jezrael

Reputation: 863751

One idea is compare numbers only, so possible use <=:

max_maturity = '20Y'
#if need extract 20
max_maturity = int(''.join(filter(str.isdigit, max_maturity)))

max_maturity = 20
#remove Y
df = df[df['maturity'].str.replace('Y','').astype(int) <= max_maturity]
#get numbers only
#df = df[df['maturity'].str.extract('(\d+)', expand=False).astype(int) <= max_maturity]

print (df)
   maturity  spot rate
0        1Y      0.182
1        2Y      0.202
2        3Y      0.284
3        4Y      0.426
4        5Y      0.585
5        6Y      0.745
6        7Y      0.892
7        8Y      1.021
8        9Y      1.130
9       10Y      1.224
10      12Y      1.375
11      15Y      1.522
12      20Y      1.653

Your solution working with shifted mask by Series.shift:

idx = data.index[data.maturity.str.contains(max_maturity,na=False).shift(fill_value=False)]

data = data.iloc[: idx[0]]

Upvotes: 2

biofa.esp

Reputation: 76

If the elements in the column maturity are of type str, you could compare them as strings since they all have "Y"at the end. I would filter the dataframe as follows:

data = data[data.maturity<='max_maturity']

Upvotes: 1

Shubham Sharma

Reputation: 71707

You can compare the maturity column with max_maturity to create a boolean mask, then reverse this mask and take cummulative maximum to create a resulting boolean mask which can be then used to filter the required rows:

max_maturity = '20Y'
df.loc[df['maturity'].eq(max_maturity)[::-1].cummax()]

   maturity  spot rate
0        1Y      0.182
1        2Y      0.202
2        3Y      0.284
3        4Y      0.426
4        5Y      0.585
5        6Y      0.745
6        7Y      0.892
7        8Y      1.021
8        9Y      1.130
9       10Y      1.224
10      12Y      1.375
11      15Y      1.522
12      20Y      1.653

Upvotes: 2

Getting subset of dataframe in Pandas

Answers (3)

Related Questions