Dervin Thunk
Dervin Thunk

Reputation: 20150

How do I return the matched portion of a string in a Pandas Series?

I have something like the following code:

df[df["A"].str.contains("\d+")]

This effectively matches all rows in a series with the word match somewhere. However, it also matches (as expected), rows of the style:

1,"ab: 123"

I would like the function to return only the matched portion of the string ("123"), rather than the whole string. Is that possible?

Upvotes: 0

Views: 42

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 91007

How about using Series.str.extract , Example -

df[df["A"].str.contains("\d+")]['A'].str.extract("(\d+)")

Example/Demo -

In [41]: df = pd.DataFrame([['123'],['ab 123'],['xyz']],columns = ['A'])

In [42]: df
Out[42]:
        A
0     123
1  ab 123
2     xyz

In [43]: df[df["A"].str.contains("\d+")]
Out[43]:
        A
0     123
1  ab 123

In [47]: df[df["A"].str.contains("\d+")]['A'].str.extract("(\d+)")
Out[47]:
0    123
1    123
Name: A, dtype: object

In [48]: df['A'].str.extract("(\d+)")
Out[48]:
0    123
1    123
2    NaN
Name: A, dtype: object

Upvotes: 2

Related Questions