erasmortg
erasmortg

Reputation: 3278

extract the month of a string

I need to extract the month from a series of strings in the format:

Tue Jan 05 03:29:10 CET 2016

I have tried with:

#extracting the second capturing group
sub("([A-z]{3})\\s([A-z]{3})","\\2","Tue Jan 05 03:29:10 CET 2016")

#or just the first whitespace with the Month:
sub("\\s([A-z]{3})","\\2","Tue Jan 05 03:29:10 CET 2016")

My expected output, in this case would be:

"Jan"

Upvotes: 1

Views: 1266

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269556

Try this sub:

sub("... (...).*", "\\1", "Tue Jan 05 03:29:10 CET 2016")
## [1] "Jan"

Upvotes: 1

mtoto
mtoto

Reputation: 24188

Or we can use the month() function from lubridate, given we convert our string to a Date object first.

library(lubridate)
month(as.Date("Tue Jan 05 03:29:10 CET 2016", "%a %b %d"), label = TRUE)
#[1] Jan

Or in base R as suggested by @HaddE.Nuff:

format(as.Date("Tue Jan 05 03:29:10 CET 2016", "%a %b %d"), "%b")

Upvotes: 4

DJJ
DJJ

Reputation: 2539

A very intuitive suggestion. That will match the first three letter of a month. Might not work for every case but it is simple to come up with.

 > aa <- regexpr("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec","Tue Jan 05 03:29:10 CET 2016")
 > regmatches("Tue Jan 05 03:29:10 CET 2016",aa)
#[1] "Jan"

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

You need to match the whole string and capture what you need into a capturing group to restore its value with a backreference later.

Use

> sub("^[[:alpha:]]{3}\\s+([[:alpha:]]{3})\\b.*", "\\1", "Tue Jan 05 03:29:10 CET 2016")
[1] "Jan"

The pattern means:

  • ^ - match start of string
  • [[:alpha:]]{3} - match 3 letters
  • \\s+ - match 1+ whitespace
  • ([[:alpha:]]{3})\\b - match and capture into Group 1 three letters as a whole word (\b is a word boundary marker)
  • .* - 0+ any characters (up to the end of the string)

See the regex demo

ALSO please note that [A-z] should be avoided.

Upvotes: 1

Related Questions