Reputation: 585
I have a DataFrame shown below
d = {'one': [1., 2., 3., 4.,5.,6], 'two': [4., 3., 2., 1.,-1,-2]}
df = pd.DataFrame(d, index=['201305', '201305', '201307', '201307', '201307','201308'])
when I was given a string ‘201307’
, I want to get the last value less than than the given string ‘201307’
, '201305'
.
How do I write the code.
Upvotes: 1
Views: 1280
Reputation: 29721
Use Index.drop_duplicates
to drop repeated entries and keeping just the first entry it encounters in conjunction with Index.get_loc
to get the integer location of the given label to use as a mask. Deduct 1 from it to obtain it's distinct previous label.
>>> idx = df.index.drop_duplicates()
>>> val = idx[idx.get_loc('201307') - 1] # <------ Insert query here
>>> val
'201305'
If you want to get the last row before the given index
string value:
>>> df.loc[val].iloc[-1]
one 2.0
two 3.0
Name: 201305, dtype: float64
Use arg method=bfill/backfill
to deal with matches not present. It immediately takes the next matched index value for such cases.
>>> val = idx[idx.get_loc('201306', method='bfill') - 1] # Here, '201307' is selected
>>> val
'201305'
Upvotes: 1
Reputation: 8693
First, don't work with strings when numbers are stored as strings. Numeric computation is much faster than string computation. Second, this is an easy problem to solve. Simply sort on index, and check:
df.index = df.index.astype(int)
df.sort_index(inplace=True)
df[df.index < int(given_value)].iloc[-1, :]
Upvotes: 1