scarecrow
scarecrow

Reputation: 94

Cant get .replace() to function correctly

Hi whoever is reading this! I am attempting to use df['ride_duration'] = df['ride_duration'].replace(r'^\s+$',"00:00:00", regex=True) to remove whitespaces/blanks from the screenshot attached below. Thing is, for some reason this is not working even though this is what I found from looking around on Stack Overflow. I Also tried: df['ride_duration'] = df['ride_duration'].replace("","00:00:00) The Result unfortunately does not change the column at all. Could someone please point me in the right direction of why this is not working?

Sample code:

fmtymd = '%Y/%m/%d'
 df = pd.read_csv(csvfilelocation, sep=',')
 df['scheduled departure time'] =  pd.to_datetime(df['scheduled departure time'], format = fmtymd)
 df['epoch_arrival'] = pd.to_datetime(df['epoch_arrival'], format = fmtymd)
 df['latetime'] = (df['epoch_arrival'] - df['scheduled departure time']).where(df['OTP'] == False)
 df['latetime'] = df['latetime'].replace("","00:00:00")
 df['latetime'] = pd.to_timedelta(df['latetime'])
 df['latetime'] = (df['latetime'] / np.timedelta64(1, 'm')).astype(int)
 df.to_csv(csvfilelocation, index=False, float_format='%.0f')

CSV Sample: https://ufile.io/qtkxb

Upvotes: 1

Views: 82

Answers (2)

mechanical_meat
mechanical_meat

Reputation: 169264

It may behoove you to read a regexp tutorial.

This regexp is looking for one or more characters of whitespace and that's it.

df['ride_duration'].replace(r'^\s+$',"00:00:00", regex=True)

Since you have now commented that this is a timedelta column we would need to use some string formatting for that. Please see updated solution below.

df['ride_duration'].astype(str).replace(r"\s+","",regex=True)

And then to get that back into a timedelta you can do:

pd.to_timedelta(df['ride_duration'].astype(str).replace(r"\s+","",regex=True))

Upvotes: 2

piratefache
piratefache

Reputation: 1368

I made an example for you. I have the following dataset:

Date, Name
, John
26.05.16, David
27.05.16, Rose
, Yolo
29.05.16, Marie
30.05.16, Mark

And this is my code in 3 steps:

import pandas as pd

# Date format
fmtymd = '%d.%m.%y'

# Load csv file
df = pd.read_csv('./sample-data.csv', sep=',')

print(df)

Result:

       Date    Name
0       NaN    John
1  26.05.16   David
2  27.05.16    Rose
3       NaN    Yolo
4  29.05.16   Marie
5  30.05.16    Mark

Now I want to set the date format to the column Date:

df['Date'] =  pd.to_datetime(df['Date'], format = fmtymd)
print(df)

Result:

        Date    Name
0        NaT    John
1 2016-05-26   David
2 2016-05-27    Rose
3        NaT    Yolo
4 2016-05-29   Marie
5 2016-05-30    Mark

This is an important step. As you can see, the empty strings are converted to "NaT" (Not-a-Timestamp). This is the reason why you can replace "" with "00:00:00"

Now I want to replace NaT with "00:00:00":

df['Date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '00:00:00' for d in df['Date']]

print(df)

Result:

         Date    Name
0    00:00:00    John
1  2016-05-26   David
2  2016-05-27    Rose
3    00:00:00    Yolo
4  2016-05-29   Marie
5  2016-05-30    Mark

This is just an example, I know that the date format are different (For the column Date). You need to set the same dateformat for your new column.

Complete code:

import pandas as pd

fmtymd = '%d.%m.%y'
df = pd.read_csv('./sample-data.csv', sep=',')

print(df)

df['Date'] =  pd.to_datetime(df['Date'], format = fmtymd)
print(df)

df['Date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '00:00:00' for d in df['Date']]

print(df)

Upvotes: 1

Related Questions