Reputation: 17144
I have a pandas series like shown below, how to select only rows where the length of the index is greater than 3?
s = pd.Series([1,2,3,4,5], index=['a','bb','ccc','dddd','eeeee'])
Required output:
dddd 4
eeeee 5
My attempt:
s[len(s.index.name)>3]
Upvotes: 0
Views: 483
Reputation: 92854
I'll enrich a collection of approaches with additional one powered by pandas.Series.filter
routine:
In [216]: s.filter(regex='.{4,}')
Out[216]:
dddd 4
eeeee 5
dtype: int64
'.{4,}'
- regex pattern to match only labels (of the index) that contain at least 4 charactersA simplified version may look as '.' * 4
or ....
And here we go with time execution measurements:
In [217]: %timeit s[s.index.str.len()>3]
254 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [218]: %timeit s[[len(i)>3 for i in s.index]]
84.5 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [219]: %timeit s[s.index.str.get(3).notnull()]
258 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [220]: %timeit s.filter(regex='.{4,}')
170 µs ± 480 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 4
Reputation: 59274
Using get
s[s.index.str.get(3).notnull()]
dddd 4
eeeee 5
dtype: int64
Upvotes: 4
Reputation: 153460
Use list comprehension:
s[[len(i)>3 for i in s.index]]
Output:
dddd 4
eeeee 5
dtype: int64
Upvotes: 3