Reputation: 422
Might be a bit of a basic question, but, say I have a dataframe that looks like:
string_lst = ["bar0001", "bar0002", "bar0003", "bar0003", "bar0004", "bar0004", "bar0005", "bar0006"]
a = pd.DataFrame({'foo': string_lst,
'test':[0,1,2,3,4,5,6,7]})
How do I subset the dataframe such that I get all "bars" from 3:6?
I am guessing something around the lines of:
a['foo'== regex 3:6]?
What I thought was to select the last n numbers of the string_lst
but the real dataframe will have different amount of numbers such as bar2005
or bar20005
, so I'm not sure how to proceed on this.
Many thanks!
Upvotes: 0
Views: 49
Reputation: 153500
IIUC,
a[a['foo'].str.contains('bar0+[3-6]', regex=True)]
Output:
foo test
2 bar0003 2
3 bar0003 3
4 bar0004 4
5 bar0004 5
6 bar0005 6
7 bar0006 7
Upvotes: 1
Reputation: 26676
What did you need?
1.Select indexes 3 to 6?
a.loc[3:6,:]
foo test
3 bar0003 3
4 bar0004 4
5 bar0004 5
6 bar0005 6
or
Select bars number 3 to 6?
a['s']=a['foo'].str.extract('(\d$)').astype(int)
a[a.s.ge(3)&a.s.le(6)].drop('s',1)
foo test
2 bar0003 2
3 bar0003 3
4 bar0004 4
5 bar0004 5
6 bar0005 6
7 bar0006 7
Upvotes: 1
Reputation: 19957
If your dataset has the same pattern (bar followed by numbers), you can do something like below. This will handle cases like 'bar004', 'bar00004' etc.
a.loc[a.foo.str.extract('(\d+)')[0].astype(float).between(3,6)]
Upvotes: 2
Reputation: 21
your regex string can be: "bar[0-9]*" this will allow: bar1, bar01, bar000000000001 but not bar 1 and bar001a
Upvotes: 1