Reputation: 94
Hi whoever is reading this! I am attempting to use df['ride_duration'] = df['ride_duration'].replace(r'^\s+$',"00:00:00", regex=True)
to remove whitespaces/blanks from the screenshot attached below. Thing is, for some reason this is not working even though this is what I found from looking around on Stack Overflow. I Also tried: df['ride_duration'] = df['ride_duration'].replace("","00:00:00)
The Result unfortunately does not change the column at all. Could someone please point me in the right direction of why this is not working?
Sample code:
fmtymd = '%Y/%m/%d'
df = pd.read_csv(csvfilelocation, sep=',')
df['scheduled departure time'] = pd.to_datetime(df['scheduled departure time'], format = fmtymd)
df['epoch_arrival'] = pd.to_datetime(df['epoch_arrival'], format = fmtymd)
df['latetime'] = (df['epoch_arrival'] - df['scheduled departure time']).where(df['OTP'] == False)
df['latetime'] = df['latetime'].replace("","00:00:00")
df['latetime'] = pd.to_timedelta(df['latetime'])
df['latetime'] = (df['latetime'] / np.timedelta64(1, 'm')).astype(int)
df.to_csv(csvfilelocation, index=False, float_format='%.0f')
CSV Sample: https://ufile.io/qtkxb
Upvotes: 1
Views: 82
Reputation: 169264
It may behoove you to read a regexp tutorial.
This regexp is looking for one or more characters of whitespace and that's it.
df['ride_duration'].replace(r'^\s+$',"00:00:00", regex=True)
Since you have now commented that this is a timedelta column we would need to use some string formatting for that. Please see updated solution below.
df['ride_duration'].astype(str).replace(r"\s+","",regex=True)
And then to get that back into a timedelta you can do:
pd.to_timedelta(df['ride_duration'].astype(str).replace(r"\s+","",regex=True))
Upvotes: 2
Reputation: 1368
I made an example for you. I have the following dataset:
Date, Name
, John
26.05.16, David
27.05.16, Rose
, Yolo
29.05.16, Marie
30.05.16, Mark
And this is my code in 3 steps:
import pandas as pd
# Date format
fmtymd = '%d.%m.%y'
# Load csv file
df = pd.read_csv('./sample-data.csv', sep=',')
print(df)
Result:
Date Name
0 NaN John
1 26.05.16 David
2 27.05.16 Rose
3 NaN Yolo
4 29.05.16 Marie
5 30.05.16 Mark
Now I want to set the date format to the column Date:
df['Date'] = pd.to_datetime(df['Date'], format = fmtymd)
print(df)
Result:
Date Name
0 NaT John
1 2016-05-26 David
2 2016-05-27 Rose
3 NaT Yolo
4 2016-05-29 Marie
5 2016-05-30 Mark
This is an important step. As you can see, the empty strings are converted to "NaT" (Not-a-Timestamp). This is the reason why you can replace "" with "00:00:00"
Now I want to replace NaT with "00:00:00":
df['Date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '00:00:00' for d in df['Date']]
print(df)
Result:
Date Name
0 00:00:00 John
1 2016-05-26 David
2 2016-05-27 Rose
3 00:00:00 Yolo
4 2016-05-29 Marie
5 2016-05-30 Mark
This is just an example, I know that the date format are different (For the column Date). You need to set the same dateformat for your new column.
Complete code:
import pandas as pd
fmtymd = '%d.%m.%y'
df = pd.read_csv('./sample-data.csv', sep=',')
print(df)
df['Date'] = pd.to_datetime(df['Date'], format = fmtymd)
print(df)
df['Date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '00:00:00' for d in df['Date']]
print(df)
Upvotes: 1