vihaa_vrutti
vihaa_vrutti

Reputation: 291

I am not getting matching string

This is my data which contains number and string.

df2 = pd.DataFrame({'A': ['1,008$','4,000$','6,000$','10,00$','8,00$','45€','45€']})
df2 = pd.DataFrame(df2, columns = ['A'])
vv=df2[df2['A'].str.match('$')]

I want an output like this.

0  1,008$
1  4,000$
2  6,000$
3  10,00$
4   8,00$

but I am getting this output:

Out[144]: 
Empty DataFrame
Columns: [A]
Index: []

can anyone please help me?

Upvotes: 1

Views: 37

Answers (2)

piRSquared
piRSquared

Reputation: 294258

A somewhat verbose way using Numpy's defchararray module.
I always want to give this some attention.

# Using @cᴏʟᴅsᴘᴇᴇᴅ's suggestion
# Same function as below but shorter namespace path

df2[np.char.find(df2.A.values.astype(str), '$') >= 0]

Old Answer

from numpy.core.defchararray import find

df2[find(df2.A.values.astype(str), '$') >= 0]

        A
0  1,008$
1  4,000$
2  6,000$
3  10,00$
4   8,00$

Upvotes: 2

cs95
cs95

Reputation: 402483

str.match starts matching from the beginning. however, your $ pattern will be found only at the end.

The fix requires either, a modification to your pattern, or changing the function.

Option 1
str.match with a modified pattern (so \$ is matched at the end) -

df2[df2.A.str.match('.*\$$')]

        A
0  1,008$
1  4,000$
2  6,000$
3  10,00$
4   8,00$

If you want to be specific about what is matched, you can match only on digits and commas -

df2[df2.A.str.match('[\d,]+\$$')]

        A
0  1,008$
1  4,000$
2  6,000$
3  10,00$
4   8,00$

Note that this does not account for invalid entries in your column (they're matched as long as they have those characters somewhere in the string, and are terminated by $).


Option 2
str.contains

df2[df2.A.str.contains('\$$')]

        A
0  1,008$
1  4,000$
2  6,000$
3  10,00$
4   8,00$

Upvotes: 2

Related Questions