Reputation: 1400
I have data that is in a character vector of the format:
"2014-03-27 11:42:32" "2014-04-03 07:13:28" "0000-00-00 00:00:00" "2012-04-16 12:46:03"
[5] "0000-00-00 00:00:00" "0000-00-00 00:00:00" "2014-04-23 09:33:23" "2014-04-30 06:31:54"[9] "2012-04-18 09:55:44" "2013-11-20 14:43:11"
What I want to do is use a single number for the year. I.E. sub 4 for 2014, 3 for 2013, 2 for 2012, and 1 for 0000-00... Beyond the single number representing the year I would like to remove all other digits and characters.
I am aware that I can use a regex and gsub(pattern="2014", replacement="4", logVector)
or some variation to accomplish my task but I am not well versed in regex. Would anyone be able to provide assistance on the syntax?
Upvotes: 0
Views: 5121
Reputation: 886948
You can also use ?substr to extract the 4th character in addition to
substr(data,4,4)
# [1] "4" "4" "0" "2" "0" "0" "4" "4" "2" "3"
or
library(stringr)
str_extract(data, perl('(?<=\\d{3})\\d'))
#[1] "4" "4" "0" "2" "0" "0" "4" "4" "2" "3"
(?<=\\d{3}) # look behind for three digits
\\d # followed by the digit that needs to be extracted
Suppose you wanted the months:
str_extract(data, perl('(?<=\\d{4}-)\\d{2}')) #look behind 4 digits followed by `-`
#[1] "03" "04" "00" "04" "00" "00" "04" "04" "04" "11"
Upvotes: 1
Reputation: 41838
This is what you need:
sub("^\\d{3}(\\d).*", "\\1", subject, perl=TRUE);
We need to capture the last digit of the year, then substitute the whole string with that digit.
Explanation
^
anchor asserts that we are at the beginning of the string\d{3}
matches three digits(\d)
matches the fourth digit and captures it to Group 1.*
matches to the end of the string\1
replaces the whole string with Group 1, which is the last digit.Upvotes: 8
Reputation: 1247
This is the pattern you're looking for:
gsub("^2014.*", "4", data)
This one is a bit more expansive and will replace years from 2011 to 2019 with the appropriate digit, though you'll need to run the second line to deal with the 0000 case.
gsub("^201([1-9]).*", "\\1", data)
gsub("^0000.*", "0", data)
Upvotes: 2