ted
ted

Reputation: 93

Better ways to combine Year and Month into Date object using mapply and lubridate

(I actually came up with a solution but that didn't satisfy my desire for simplicity and intuitiveness, therefore here I state my question and solution while waiting for a nice and neat solution.)

I have a data with one column being Year and the other being Month, while the month is in the format of string:

  Country   Month      Year Type    
  <fct>     <chr>     <dbl> <fct>   
1 Argentina June       1975 Currency
2 Argentina February   1981 Currency
3 Argentina July       1982 Currency

I am trying to combine the Month and Year column to a single column Date, which is in the format of date.

First Try

My first try was to use mapply, with the help of lubridate and a little function of my that transforms month from string to int.

months = c("January", "February", "March", "April", 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
month_num = c(1:12)
names(month_num) = months


crisis$Date = mapply(function(y, m){
  m = month_num[m]
  d = make_date(y,m) 
  return(d)
},crisis$Year, crisis$Month)

However this didn't turn out to be what I want:

  Country   Month      Year Type     Date      
  <fct>     <chr>     <dbl> <fct>    <list>    
1 Argentina June       1975 Currency <date [1]>
2 Argentina February   1981 Currency <date [1]>
3 Argentina July       1982 Currency <date [1]>
4 Argentina September  1986 Currency <date [1]>

, as the Date column is list format.

Some Googling

With some help from this post and some manipulation on unlisting it and turning it back to date object, I managed to get the result I want:

crisis$Date = as_date(unlist(mapply(function(y, m){
  m = month_num[m]
  d = make_date(y,m) 
  return(d)
},crisis$Year, crisis$Month, SIMPLIFY = FALSE)))

The result is

  Country   Month      Year Type     Date      
  <fct>     <chr>     <dbl> <fct>    <date>    
1 Argentina June       1975 Currency 1975-06-01
2 Argentina February   1981 Currency 1981-02-01
3 Argentina July       1982 Currency 1982-07-01
4 Argentina September  1986 Currency 1986-09-01

This is so far fine to deal with, but I believe there are better solutions.

Upvotes: 2

Views: 1667

Answers (3)

ted
ted

Reputation: 93

So after the help from @Gram and @det, I came up with my solution.

I am a new learner in R so I didn't realize some of the R-ish style of handling datas, therefore tried to make every thing done in one single line of code. Thanks to some tips from Gram's answer, I somehow learned to clear my code by adding auxilary columns instead (which is similar to excel).

Consider that there might be situations in the future where the correspondence might not simply be from 1:12 to months, and to make things more general for future utilization, I create a new data.frame just to store all the information about months:

month_ref = data.frame(num = 1:12, Month = c("January", "February", "March", "April", 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'))
  num    Month
1   1  January
2   2 February
3   3    March
4   4    April

Now the idea is to "combine" the two dataframes, matching the Month column to numerical numbers. This is exactly like the VLOOKUP function in excel, and with help from this post, I now have a dataframe with a column of numbers

crisis  = crisis %>% 
  inner_join(month_ref, by=c("Month")) 
  Country   Month      Year Type       num
  <fct>     <chr>     <dbl> <fct>    <int>
1 Argentina June       1975 Currency     6
2 Argentina February   1981 Currency     2
3 Argentina July       1982 Currency     7
4 Argentina September  1986 Currency     9

I can then handle my dataframe with a neat column of month in number, which is much more easier and readable than handling the parsing in a custom function in mutate().

crisis  = crisis %>% 
  inner_join(month_ref, by="Month") %>%
  mutate(
    Date = lubridate::ymd(paste(Year, num, "01", sep="-"))
  ) %>%
  select(-c(num, Month, Year))
  Country   Type     Date      
  <fct>     <fct>    <date>    
1 Argentina Currency 1975-06-01
2 Argentina Currency 1981-02-01
3 Argentina Currency 1982-07-01

Upvotes: 1

det
det

Reputation: 5232

df$Date <- lubridate::myd(paste(df$Month, df$Year, "1"))

Upvotes: 3

WilliamGram
WilliamGram

Reputation: 683

You can convert month to a number, and then from there to a date:

df %>% 
  mutate(
    Month = base::match(Month, base::month.name),
    Date = as.Date(paste(Year, '-', Month, '-01', sep=''))
  ) %>% 
  select(-c(Month, Year))

# A tibble: 3 x 3
#   Country   Type     Date      
#   <chr>     <chr>    <date>    
# 1 Argentina Currency 1975-06-01
# 2 Argentina Currency 1981-02-01
# 3 Argentina Currency 1982-07-01

Does this help?

I provided the dataframe below:

library(tibble)

df <- tibble(
  Country = 'Argentina',
  Month = c('June', 'February', 'July'),
  Year = c(1975, 1981, 1982),
  Type = 'Currency'
)

Upvotes: 2

Related Questions