Reputation: 175
Good Morning, I am trying to read a list and recognize strings with datetime-format. Here is an example of my list (with datetime or version number):
Now I want to change 2014-12-01 to 01.12.2014. It works fine with this code:
dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")
But if I use this code here:
for i, x in dfnew.iterrows():
print (dfnew[1].iloc[i])
if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].iloc[i], "%YY-%mm-%d")) ):
dfnew[1].iloc[i]=datetime.datetime.strptime(dfnew[1].iloc[i], "%Y-%m-%d").strftime("%d.%m.%Y")
I get the error:
Traceback (most recent call last):
File "CVE_List.py", line 74, in <module>
if ( (dfnew[1].iloc[i]) == (datetime.datetime.strptime(dfnew[1].i
File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 56
tt, fraction = _strptime(data_string, format)
File "C:\Program Files (x86)\Python36-32\lib\_strptime.py", line 36
(data_string, format))
ValueError: time data '1.0.1' does not match format '%Y-%m-%d'
But normally the row with format x.y.z have to be ignored.
Upvotes: 1
Views: 66
Reputation: 862731
Dont use iterrows
if not necessary, because very slow. Better is use pandas vectorized function.
Here the best is use to_datetime
with parameter errors='coerce'
for convert not parseable datetimes to NaT
.
If need convert to custom string format add dt.strftime
:
print (df)
0 1
0 String 1.0.1
1 String2 2017-10-03
2 String3 2014-12-01
3 String4 01.11.2017
df[1] = pd.to_datetime(df[1], errors='coerce')
print (df)
0 1
0 String NaT
1 String2 2017-10-03
2 String3 2014-12-01
3 String4 2017-01-11
EDIT:
General solution need distingush all possible formats of dates with year
s, dates without years
and then concat
them to df1
DataFrame. Last for each row remove NaN
s and add versions
by combine_first
:
#remove trailing whitespaces
c = df[1].str.strip()
dates_formats_with_years = ['%Y-%m-%d','%d.%m.%Y']
d1 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_with_years]
s1 = [s.dt.strftime("%#d.%#m")
.add('.')
.add(s.dt.strftime("%y").str.lstrip('0'))
.str.rstrip('.')
.replace('NaT.NaT', np.nan) for s in d1]
dates_formats_no_years = ['%d.%b']
d2 = [pd.to_datetime(c, errors='coerce', format=x) for x in dates_formats_no_years]
s2 = [s.dt.strftime("%#d.%#m").replace('NaT', np.nan) for s in d2]
df1 = pd.concat(s1 + s2, axis=1, keys=range(len(s1) + len(s2)))
print (df1)
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 3.5.29 NaN NaN
3 2.8.17 NaN NaN
4 NaN NaN NaN
5 5.6 NaN NaN
6 3.4.7 NaN NaN
7 NaN NaN NaN
8 7.6.17 NaN NaN
9 4.1.1 NaN NaN
10 1.8.3 NaN NaN
11 NaN NaN NaN
12 NaN NaN NaN
13 NaN NaN NaN
14 1.2.8 NaN NaN
df[1] = df1.apply(lambda x: x.sort_values().iat[0], axis=1).combine_first(c)
print (df)
0 1
0 alsa-intf UMC-9240P_D_v0.30
1 alsa-lib 1.0.29
2 base-passwd 3.5.29
3 sshfs-fuse 2.8.17
4 start-scripts-firmware-links 1.0
5 s 5.6
6 syslog-ng 3.4.7
7 system-conf UMC-9240P_D_v0.30
8 tcp-wrappers 7.6.17
9 tcpdump 4.1.1
10 tinyproxy 1.8.3
11 tinyscheme 1.0.0
12 useradd-peiker 1.0
13 util-linux 2.26.2
14 zlib 1.2.8
Upvotes: 1