TigerClaw
TigerClaw

Reputation: 175

Python - Match and change datatime

Good Morning, I am trying to read a list and recognize strings with datetime-format. Here is an example of my list (with datetime or version number):

Now I want to change 2014-12-01 to 01.12.2014. It works fine with this code:

dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")

But if I use this code here:

for i, x in dfnew.iterrows():
    print (dfnew[1].iloc[i])
    if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].iloc[i], "%YY-%mm-%d")) ):
        dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")

I get the error:

Traceback (most recent call last):
  File "CVE_List.py", line 74, in <module>
    if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].i
  File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 56
    tt, fraction = _strptime(data_string, format)
  File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 36
    (data_string, format))
ValueError: time data '1.0.1' does not match format '%Y-%m-%d'

But normally the row with format x.y.z have to be ignored.

Upvotes: 1

Views: 66

Answers (1)

jezrael
jezrael

Reputation: 862731

Dont use iterrows if not necessary, because very slow. Better is use pandas vectorized function.

Here the best is use to_datetime with parameter errors='coerce' for convert not parseable datetimes to NaT.

If need convert to custom string format add dt.strftime:

print (df)
         0           1
0   String       1.0.1
1  String2  2017-10-03
2  String3  2014-12-01
3  String4  01.11.2017

df[1] = pd.to_datetime(df[1], errors='coerce')
print (df)
         0          1
0   String        NaT
1  String2 2017-10-03
2  String3 2014-12-01
3  String4 2017-01-11

EDIT:

General solution need distingush all possible formats of dates with years, dates without years and then concat them to df1 DataFrame. Last for each row remove NaNs and add versions by combine_first:

#remove trailing whitespaces 
c = df[1].str.strip()

dates_formats_with_years = ['%Y-%m-%d','%d.%m.%Y']
d1 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_with_years]
s1 = [s.dt.strftime("%#d.%#m")
          .add('.')
          .add(s.dt.strftime("%y").str.lstrip('0'))
          .str.rstrip('.')
          .replace('NaT.NaT', np.nan) for s in d1]

dates_formats_no_years = ['%d.%b']
d2 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_no_years]
s2 = [s.dt.strftime("%#d.%#m").replace('NaT', np.nan) for s in d2]

df1 = pd.concat(s1 + s2, axis=1, keys=range(len(s1) + len(s2)))
print (df1)
         0   1   2
0      NaN NaN NaN
1      NaN NaN NaN
2   3.5.29 NaN NaN
3   2.8.17 NaN NaN
4      NaN NaN NaN
5      5.6 NaN NaN
6    3.4.7 NaN NaN
7      NaN NaN NaN
8   7.6.17 NaN NaN
9    4.1.1 NaN NaN
10   1.8.3 NaN NaN
11     NaN NaN NaN
12     NaN NaN NaN
13     NaN NaN NaN
14   1.2.8 NaN NaN

df[1] = df1.apply(lambda x: x.sort_values().iat[0], axis=1).combine_first(c)
print (df)
                               0                  1
0                      alsa-intf  UMC-9240P_D_v0.30
1                       alsa-lib             1.0.29
2                    base-passwd             3.5.29
3                     sshfs-fuse             2.8.17
4   start-scripts-firmware-links                1.0
5                              s                5.6
6                      syslog-ng              3.4.7
7                    system-conf  UMC-9240P_D_v0.30
8                   tcp-wrappers             7.6.17
9                        tcpdump              4.1.1
10                     tinyproxy              1.8.3
11                    tinyscheme              1.0.0
12                useradd-peiker                1.0
13                    util-linux             2.26.2
14                          zlib              1.2.8

Upvotes: 1

Related Questions