Reputation: 6270
also there are several similar questions to that, I am still not able to solve my issue.
I have a pandas column from a poker game and want to analyze the pot size out of it, therefore I need to extract the number (with a . decimalseperator) after a $
. The column looks like this:
Action
Player (8, 5) won the $5.40 main pot with a Straight
...
Player (A, 2) won the $21.00 main pot with a flush
...
when i run: df['number'] = df['action'].str.extract('([0-9][,.]*[0-9]*)')
it doesn't give me the expected outcome the outcome shold be:
number
5.40
...
21.00
Upvotes: 1
Views: 39
Reputation: 627020
You can use
>>> import pandas as pd
>>> df = pd.DataFrame({'action':['Player (8, 5) won the $5.40 main pot with a Straight','Player (A, 2) won the $21.00 main pot with a flush']})
>>> df['action'].str.extract(r'\$(\d+(?:[,.]\d+)*)', expand=False)
0 5.40
1 21.00
Name: Action, dtype: object
The \$(\d+(?:[,.]\d+)*)
pattern matches a literal $
symbol, and then captures into Group 1 any one or more digits and then zero or more sequences of a ,
or .
and then one or more digits.
See the regex demo.
Upvotes: 2