PyNoob
PyNoob

Reputation: 223

How do I correctly remove all text from column in Pandas?

I have a dataframe as:

df:

+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|     | country                 | league                       | home_odds   |   draw_odds |   away_odds |   home_score | away_score   | home_team                | away_team                    | datetime            |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
|  63 | Chile                   | Primera Division             | 2.80        |        3.05 |        2.63 |            3 | 1            | Melipilla                | O'Higgins                    | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  64 | North & Central America | CONCACAF Nations League      | 2.95        |        3.07 |        2.49 |            3 | 2 ET         | USA                      | Mexico                       | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  66 | World                   | World Cup 2022               | 1.04        |       13.43 |       28.04 |            0 | 1            | Kyrgyzstan               | Mongolia                     | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  65 | World                   | Friendly International       | 1.52        |        3.91 |        7.01 |            1 | 1            | Serbia                   | Jamaica                      | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+

I want the columns home_score and away_score to be just integers and I am trying regex as:

df[['home_score', 'away_score']] = re.sub('\D', '', '.*')

however all the columns are coming in blank.

How do I correctly do it?

Upvotes: 0

Views: 170

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24304

You can try via extract() and astype() method:

df['away_score']=df['away_score'].str.extract('^(\d+)').astype(int)
df['home_score']=df['home_score'].str.extract('^(\d+)').astype(int)

OR

df['away_score']=df['away_score'].str.extract('([0-9]+)').astype(int)
df['home_score']=df['home_score'].str.extract('([0-9]+)').astype(int)

output:

+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|     | country                 | league                       | home_odds   |   draw_odds |   away_odds |   home_score | away_score   | home_team                | away_team                    | datetime            |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
|  63 | Chile                   | Primera Division             | 2.80        |        3.05 |        2.63 |            3 | 1            | Melipilla                | O'Higgins                    | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  64 | North & Central America | CONCACAF Nations League      | 2.95        |        3.07 |        2.49 |            3 | 2            | USA                      | Mexico                       | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  66 | World                   | World Cup 2022               | 1.04        |       13.43 |       28.04 |            0 | 1            | Kyrgyzstan               | Mongolia                     | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
|  65 | World                   | Friendly International       | 1.52        |        3.91 |        7.01 |            1 | 1            | Serbia                   | Jamaica                      | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+

Upvotes: 1

Acccumulation
Acccumulation

Reputation: 3591

You can do df[['home_score', 'away_score']] = df[['home_score', 'away_score']].applymap(lambda x: int(float(x)))

Upvotes: 0

Related Questions