Reputation: 3278
I need to extract the month from a series of strings in the format:
Tue Jan 05 03:29:10 CET 2016
I have tried with:
#extracting the second capturing group
sub("([A-z]{3})\\s([A-z]{3})","\\2","Tue Jan 05 03:29:10 CET 2016")
#or just the first whitespace with the Month:
sub("\\s([A-z]{3})","\\2","Tue Jan 05 03:29:10 CET 2016")
My expected output, in this case would be:
"Jan"
Upvotes: 1
Views: 1266
Reputation: 269556
Try this sub
:
sub("... (...).*", "\\1", "Tue Jan 05 03:29:10 CET 2016")
## [1] "Jan"
Upvotes: 1
Reputation: 24188
Or we can use the month()
function from lubridate
, given we convert our string to a Date
object first.
library(lubridate)
month(as.Date("Tue Jan 05 03:29:10 CET 2016", "%a %b %d"), label = TRUE)
#[1] Jan
Or in base
R as suggested by @HaddE.Nuff:
format(as.Date("Tue Jan 05 03:29:10 CET 2016", "%a %b %d"), "%b")
Upvotes: 4
Reputation: 2539
A very intuitive suggestion. That will match the first three letter of a month. Might not work for every case but it is simple to come up with.
> aa <- regexpr("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec","Tue Jan 05 03:29:10 CET 2016")
> regmatches("Tue Jan 05 03:29:10 CET 2016",aa)
#[1] "Jan"
Upvotes: 1
Reputation: 626802
You need to match the whole string and capture what you need into a capturing group to restore its value with a backreference later.
Use
> sub("^[[:alpha:]]{3}\\s+([[:alpha:]]{3})\\b.*", "\\1", "Tue Jan 05 03:29:10 CET 2016")
[1] "Jan"
The pattern means:
^
- match start of string[[:alpha:]]{3}
- match 3 letters\\s+
- match 1+ whitespace([[:alpha:]]{3})\\b
- match and capture into Group 1 three letters as a whole word (\b
is a word boundary marker).*
- 0+ any characters (up to the end of the string)See the regex demo
ALSO please note that [A-z]
should be avoided.
Upvotes: 1