Reputation: 20150
I have something like the following code:
df[df["A"].str.contains("\d+")]
This effectively matches all rows in a series with the word match somewhere. However, it also matches (as expected), rows of the style:
1,"ab: 123"
I would like the function to return only the matched portion of the string ("123"), rather than the whole string. Is that possible?
Upvotes: 0
Views: 42
Reputation: 91007
How about using Series.str.extract
, Example -
df[df["A"].str.contains("\d+")]['A'].str.extract("(\d+)")
Example/Demo -
In [41]: df = pd.DataFrame([['123'],['ab 123'],['xyz']],columns = ['A'])
In [42]: df
Out[42]:
A
0 123
1 ab 123
2 xyz
In [43]: df[df["A"].str.contains("\d+")]
Out[43]:
A
0 123
1 ab 123
In [47]: df[df["A"].str.contains("\d+")]['A'].str.extract("(\d+)")
Out[47]:
0 123
1 123
Name: A, dtype: object
In [48]: df['A'].str.extract("(\d+)")
Out[48]:
0 123
1 123
2 NaN
Name: A, dtype: object
Upvotes: 2