Pandas: Filling mixed datetime field from another column

Question

I have a datetime column with occasional missing date information. It contains only hour information.

        Entry   Init                Edit
    14  1666832 10.10.2002 02:10    10.10.2002 02:10
    15  1675384 13.10.2002 13:49    13.10.2002 13:49
    16  1709573 25.10.2002 14:21    14:23           
    17  1740242 03.11.2002 21:08    03.11.2002 21:08
    18  1751053 05.11.2002 10:46    05.11.2002 10:46

I want to add the date info to "Edit" column with the dates from "Init" field if "Edit" only displays only hour info. I wrote a function and applied it to related fields.

def edit(x,y):
    if len(y) < 16:
        y = f"{x.split(' ')[0]} {y}"
    return y

df["EditDate_proper"] = df.apply(lambda x: edit(x['Init'], x['Edit']), axis=1)

It works alright, but I wonder if there is a more clever way of filling the misformatted values similar to "fillna" method. For example, "if this field doesn't contain %d.%m.%Y %H:%M type of formatting, take it from this field's %d.%M.%Y info"

Thanks in advance.

jezrael · Accepted Answer

I wonder if there is a more clever way of filling the misformatted values similar to "fillna" method.

You can convert values to datetimes and timedeltas by to_datetime and to_timedelta with errors='coerce' so if no match get missing value, remove times by Series.dt.floor, add to timedeltas and last replace missing values by original values converted to datetimes:

df['Init'] = pd.to_datetime(df['Init'], errors='coerce')
timedeltas = pd.to_timedelta(df['Edit'] + ':00', errors='coerce')
edit_dates = pd.to_datetime(df['Edit'], errors='coerce')

df["EditDate_proper"]  = (timedeltas + df['Init'].dt.floor('d')).fillna(edit_dates)
print (df)
      Entry                Init              Edit     EditDate_proper
14  1666832 2002-10-10 02:10:00  10.10.2002 02:10 2002-10-10 02:10:00
15  1675384 2002-10-13 13:49:00  13.10.2002 13:49 2002-10-13 13:49:00
16  1709573 2002-10-25 14:21:00             14:23 2002-10-25 14:23:00
17  1740242 2002-03-11 21:08:00  03.11.2002 21:08 2002-03-11 21:08:00
18  1751053 2002-05-11 10:46:00  05.11.2002 10:46 2002-05-11 10:46:00

If want working with strings:

m = df['Edit'].str.len() < 16
df["EditDate_proper"] = df['Init'].mask(m, df['Init'].str.split().str[0] + ' ' + df['Edit'])
print (df)
      Entry              Init              Edit   EditDate_proper
14  1666832  10.10.2002 02:10  10.10.2002 02:10  10.10.2002 02:10
15  1675384  13.10.2002 13:49  13.10.2002 13:49  13.10.2002 13:49
16  1709573  25.10.2002 14:21             14:23  25.10.2002 14:23
17  1740242  03.11.2002 21:08  03.11.2002 21:08  03.11.2002 21:08
18  1751053  05.11.2002 10:46  05.11.2002 10:46  05.11.2002 10:46

Pandas: Filling mixed datetime field from another column

Answers (1)

Related Questions