Reputation: 33
I have a datetime column with occasional missing date information. It contains only hour information.
Entry Init Edit
14 1666832 10.10.2002 02:10 10.10.2002 02:10
15 1675384 13.10.2002 13:49 13.10.2002 13:49
16 1709573 25.10.2002 14:21 14:23
17 1740242 03.11.2002 21:08 03.11.2002 21:08
18 1751053 05.11.2002 10:46 05.11.2002 10:46
I want to add the date info to "Edit" column with the dates from "Init" field if "Edit" only displays only hour info. I wrote a function and applied it to related fields.
def edit(x,y):
if len(y) < 16:
y = f"{x.split(' ')[0]} {y}"
return y
df["EditDate_proper"] = df.apply(lambda x: edit(x['Init'], x['Edit']), axis=1)
It works alright, but I wonder if there is a more clever way of filling the misformatted values similar to "fillna" method. For example, "if this field doesn't contain %d.%m.%Y %H:%M type of formatting, take it from this field's %d.%M.%Y info"
Thanks in advance.
Upvotes: 1
Views: 112
Reputation: 862681
I wonder if there is a more clever way of filling the misformatted values similar to "fillna" method.
You can convert values to datetimes and timedeltas by to_datetime
and
to_timedelta
with errors='coerce'
so if no match get missing value, remove times by Series.dt.floor
, add to timedeltas and last replace missing values by original values converted to datetimes:
df['Init'] = pd.to_datetime(df['Init'], errors='coerce')
timedeltas = pd.to_timedelta(df['Edit'] + ':00', errors='coerce')
edit_dates = pd.to_datetime(df['Edit'], errors='coerce')
df["EditDate_proper"] = (timedeltas + df['Init'].dt.floor('d')).fillna(edit_dates)
print (df)
Entry Init Edit EditDate_proper
14 1666832 2002-10-10 02:10:00 10.10.2002 02:10 2002-10-10 02:10:00
15 1675384 2002-10-13 13:49:00 13.10.2002 13:49 2002-10-13 13:49:00
16 1709573 2002-10-25 14:21:00 14:23 2002-10-25 14:23:00
17 1740242 2002-03-11 21:08:00 03.11.2002 21:08 2002-03-11 21:08:00
18 1751053 2002-05-11 10:46:00 05.11.2002 10:46 2002-05-11 10:46:00
If want working with strings:
m = df['Edit'].str.len() < 16
df["EditDate_proper"] = df['Init'].mask(m, df['Init'].str.split().str[0] + ' ' + df['Edit'])
print (df)
Entry Init Edit EditDate_proper
14 1666832 10.10.2002 02:10 10.10.2002 02:10 10.10.2002 02:10
15 1675384 13.10.2002 13:49 13.10.2002 13:49 13.10.2002 13:49
16 1709573 25.10.2002 14:21 14:23 25.10.2002 14:23
17 1740242 03.11.2002 21:08 03.11.2002 21:08 03.11.2002 21:08
18 1751053 05.11.2002 10:46 05.11.2002 10:46 05.11.2002 10:46
Upvotes: 1