Reputation: 291
This is my data which contains number and string.
df2 = pd.DataFrame({'A': ['1,008$','4,000$','6,000$','10,00$','8,00$','45€','45€']})
df2 = pd.DataFrame(df2, columns = ['A'])
vv=df2[df2['A'].str.match('$')]
I want an output like this.
0 1,008$
1 4,000$
2 6,000$
3 10,00$
4 8,00$
but I am getting this output:
Out[144]:
Empty DataFrame
Columns: [A]
Index: []
can anyone please help me?
Upvotes: 1
Views: 37
Reputation: 294258
A somewhat verbose way using Numpy's defchararray
module.
I always want to give this some attention.
# Using @cᴏʟᴅsᴘᴇᴇᴅ's suggestion
# Same function as below but shorter namespace path
df2[np.char.find(df2.A.values.astype(str), '$') >= 0]
Old Answer
from numpy.core.defchararray import find
df2[find(df2.A.values.astype(str), '$') >= 0]
A
0 1,008$
1 4,000$
2 6,000$
3 10,00$
4 8,00$
Upvotes: 2
Reputation: 402483
str.match
starts matching from the beginning. however, your $
pattern will be found only at the end.
The fix requires either, a modification to your pattern, or changing the function.
Option 1
str.match
with a modified pattern (so \$
is matched at the end) -
df2[df2.A.str.match('.*\$$')]
A
0 1,008$
1 4,000$
2 6,000$
3 10,00$
4 8,00$
If you want to be specific about what is matched, you can match only on digits and commas -
df2[df2.A.str.match('[\d,]+\$$')]
A
0 1,008$
1 4,000$
2 6,000$
3 10,00$
4 8,00$
Note that this does not account for invalid entries in your column (they're matched as long as they have those characters somewhere in the string, and are terminated by $
).
Option 2
str.contains
df2[df2.A.str.contains('\$$')]
A
0 1,008$
1 4,000$
2 6,000$
3 10,00$
4 8,00$
Upvotes: 2