Reputation: 223
I have a dataframe as:
df:
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| | country | league | home_odds | draw_odds | away_odds | home_score | away_score | home_team | away_team | datetime |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
| 63 | Chile | Primera Division | 2.80 | 3.05 | 2.63 | 3 | 1 | Melipilla | O'Higgins | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 64 | North & Central America | CONCACAF Nations League | 2.95 | 3.07 | 2.49 | 3 | 2 ET | USA | Mexico | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 66 | World | World Cup 2022 | 1.04 | 13.43 | 28.04 | 0 | 1 | Kyrgyzstan | Mongolia | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 65 | World | Friendly International | 1.52 | 3.91 | 7.01 | 1 | 1 | Serbia | Jamaica | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
I want the columns home_score
and away_score
to be just integers and I am trying regex as:
df[['home_score', 'away_score']] = re.sub('\D', '', '.*')
however all the columns are coming in blank.
How do I correctly do it?
Upvotes: 0
Views: 170
Reputation: 24304
You can try via extract()
and astype()
method:
df['away_score']=df['away_score'].str.extract('^(\d+)').astype(int)
df['home_score']=df['home_score'].str.extract('^(\d+)').astype(int)
OR
df['away_score']=df['away_score'].str.extract('([0-9]+)').astype(int)
df['home_score']=df['home_score'].str.extract('([0-9]+)').astype(int)
output:
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| | country | league | home_odds | draw_odds | away_odds | home_score | away_score | home_team | away_team | datetime |
+=====+=========================+==============================+=============+=============+=============+==============+==============+==========================+==============================+=====================+
| 63 | Chile | Primera Division | 2.80 | 3.05 | 2.63 | 3 | 1 | Melipilla | O'Higgins | 2021-06-07 00:30:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 64 | North & Central America | CONCACAF Nations League | 2.95 | 3.07 | 2.49 | 3 | 2 | USA | Mexico | 2021-06-07 01:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 66 | World | World Cup 2022 | 1.04 | 13.43 | 28.04 | 0 | 1 | Kyrgyzstan | Mongolia | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
| 65 | World | Friendly International | 1.52 | 3.91 | 7.01 | 1 | 1 | Serbia | Jamaica | 2021-06-07 07:00:00 |
+-----+-------------------------+------------------------------+-------------+-------------+-------------+--------------+--------------+--------------------------+------------------------------+---------------------+
Upvotes: 1
Reputation: 3591
You can do df[['home_score', 'away_score']] = df[['home_score', 'away_score']].applymap(lambda x: int(float(x)))
Upvotes: 0