Michael Queue
Michael Queue

Reputation: 1400

Using gsub to find and replace with a regular expression

I have data that is in a character vector of the format:

"2014-03-27 11:42:32" "2014-04-03 07:13:28" "0000-00-00 00:00:00" "2012-04-16 12:46:03"
[5] "0000-00-00 00:00:00" "0000-00-00 00:00:00" "2014-04-23 09:33:23" "2014-04-30 06:31:54"[9] "2012-04-18 09:55:44" "2013-11-20 14:43:11" 

What I want to do is use a single number for the year. I.E. sub 4 for 2014, 3 for 2013, 2 for 2012, and 1 for 0000-00... Beyond the single number representing the year I would like to remove all other digits and characters.

I am aware that I can use a regex and gsub(pattern="2014", replacement="4", logVector) or some variation to accomplish my task but I am not well versed in regex. Would anyone be able to provide assistance on the syntax?

Upvotes: 0

Views: 5121

Answers (3)

akrun
akrun

Reputation: 886948

You can also use ?substr to extract the 4th character in addition to

substr(data,4,4)
# [1] "4" "4" "0" "2" "0" "0" "4" "4" "2" "3"

or

 library(stringr)
  str_extract(data, perl('(?<=\\d{3})\\d')) 
 #[1] "4" "4" "0" "2" "0" "0" "4" "4" "2" "3"

Explanation

(?<=\\d{3}) # look behind for three digits
\\d # followed by the digit that needs to be extracted

Suppose you wanted the months:

str_extract(data, perl('(?<=\\d{4}-)\\d{2}')) #look behind 4 digits followed by `-`
#[1] "03" "04" "00" "04" "00" "00" "04" "04" "04" "11"

Upvotes: 1

zx81
zx81

Reputation: 41838

This is what you need:

sub("^\\d{3}(\\d).*", "\\1", subject, perl=TRUE);

We need to capture the last digit of the year, then substitute the whole string with that digit.

Explanation

  • The ^ anchor asserts that we are at the beginning of the string
  • \d{3} matches three digits
  • (\d) matches the fourth digit and captures it to Group 1
  • .* matches to the end of the string
  • \1 replaces the whole string with Group 1, which is the last digit.

Upvotes: 8

Sean Murphy
Sean Murphy

Reputation: 1247

This is the pattern you're looking for:

gsub("^2014.*", "4", data) 

This one is a bit more expansive and will replace years from 2011 to 2019 with the appropriate digit, though you'll need to run the second line to deal with the 0000 case.

gsub("^201([1-9]).*", "\\1", data)
gsub("^0000.*", "0", data)

Upvotes: 2

Related Questions