Reputation: 958
For the string = "4/3/09" using
df['dates'] = df['dates'].str.replace(r'([/ ]\d\d)\b', r'19\g<0>')
#or
df['dates'] = df['dates'].str.replace(r'([/ ]\d\d)$', r'19\g<0>')
I am getting 4/319/09 but I should get 4/3/1909
My data:
date_set = ['04/20/2009', '04/20/09', '4/20/09', '4/3/09',
'Mar-20-2009', 'Mar 20, 2009', 'March 20, 2009', 'Mar. 20, 2009',
'Mar 20 2009','20 Mar 2009', '20 March 2009', '20 Mar. 2009',
'20 March, 2009','Mar 20th, 2009', 'Mar 21st, 2009', 'Mar 22nd, 2009',
'Feb 2009', 'Sep 2009', 'Oct 2010',
'6/2008', '12/2009',
'2009', '2010']
If there is 2 digit year i need to add 1900. Ex - if year is 09, it should get replaced with 1909
Upvotes: 2
Views: 33
Reputation: 627101
The ([/ ]\d\d)\b
pattern matches /
or space and then 2 digits up to a word boundary, and str.replace
replaces the match (here, /09
) with 19
+ the whole match resulting in 4/3
+ 19/09
=> 4/319/09
.
You need to use
df['dates'] = df['dates'].str.replace(r'([/ ])(\d\d)\b', r'\g<1>19\2')
See the regex demo
Here,
([/ ])
- Capturing group 1: a /
or space(\d\d)
- Capturing group 2: two digits\b
- word boundaryThe replacement is r'\g<1>19\2
, i.e. Group 1 (here, an unambiguous backreference to Group 1 is used since the next char in the replacement pattern is a digit, see python re.sub group: number after \number) + 19
and Group 2 value (here, \2
is a regular numeric backreference is used since there is nothing following the pattern).
See re.sub
Python documentation.
EDIT
After you added more data, it seems you need to only match the two digits at the end of the string.
Use
df['dates'] = df['dates'].str.replace(r'([/ ])(\d\d)$', r'\g<1>19\2')
df['dates'] = df['dates'].str.replace(r'(?<=[/ ])(?=\d\d$)', '19')
The second line removes the problem wtith backreferences since it uses lookarounds.
Upvotes: 1