Nathalie
Nathalie

Reputation: 1238

Keep only the year from a data timestamp column

Having a dataframe like this:

data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE)

How is it possible to keep only year in the timestamp column having in mind that all years are after 2000? An expected output:

data.frame(id = c(1,3), timestamp = c("2009", "2017"), stringAsFactor = FALSE)

Upvotes: 0

Views: 65

Answers (4)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Base R:

format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
[1] "2009" "2017"

So in the dataframe:

df$year <- format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
  id           timestamp year
1  1 20-10-2009 11:35:12 2009
2  3 01-01-2017 12:21:21 2017

Another option, if you're into or familiar with regex, is this:

sub(".*([0-9]{4}).*", "\\1", df$timestamp)
[1] "2009" "2017"

Upvotes: 4

Doctor G
Doctor G

Reputation: 163

I have a tidyverse solution to your problem:

library(tidyverse)
data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE) %>% 
  mutate(timestamp = timestamp %>% 
  str_extract("\\d{4}"))

The function str_extract("\\d{4}") should always extract the first four digits of your target variable.

Upvotes: 1

Sri Sreshtan
Sri Sreshtan

Reputation: 545

See if this answers your question. The code and the output is as follows :-

library(lubridate)
library(tidyverse)
df <- data.frame(id = c(1,3,4), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21","01-01-1998 12:21:21"), stringAsFactor = FALSE)
df$timestamp <- dmy_hms(df$timestamp)
df1 <- df %>% 
    filter(year(timestamp) > 2000) %>% 
    mutate(new_year = year(timestamp))
df1

#id           timestamp stringAsFactor new_year
#1  1 2009-10-20 11:35:12          FALSE     2009
#2  3 2017-01-01 12:21:21          FALSE     2017

Upvotes: 2

mabreitling
mabreitling

Reputation: 606

If you're not afraid of external packages, one option would be to make use of the lubridate package:

df <- data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"))

df <- df %>%
  mutate(timestamp = lubridate::dmy_hms(timestamp)) %>%
  mutate(year = lubridate::year(timestamp))

Obviously, if you actually want to replace the timestampe column, you have to change the last mutate command. Result:

 id           timestamp year
1  1 2009-10-20 11:35:12 2009
2  3 2017-01-01 12:21:21 2017

Upvotes: 1

Related Questions