Reputation: 43
Apologies if this is a repeat question, I searched and could not find the specific answer I am looking for.
I have a data frame where one column is a 16-digit code, and there are a number of other columns. Here is a simplified example:
code = c("1109619910224003", "1157919910102001", "1539820070315001", "1563120190907002")
year = c(1991, 1991, 2007, 2019)
month = c(02, 01, 03, 09)
dat = as.data.frame(cbind(code,year,month))
dat
> dat
code year month
1 1109619910224003 1991 2
2 1157919910102001 1991 1
3 1539820070315001 2007 3
4 1563120190907002 2019 9
As you can see, the code contains year, month, and day information. I already have columns for year and month in my dataframe, but I need to also create a day column, which would be 24, 02, 15, and 07 in this example. The date is always in the format yyyymmdd and begins as the 6th digit in the code. So I essentially need to extract the 12th and 13th digits from each code to create my day column.
I then need to create another column for day of year from the date information, so I end up with the following:
day = c(24, 02, 15, 07)
dayofyear = c(55, 2, 74, 250)
dat2 = as.data.frame(cbind(code,year,month,day,dayofyear))
dat2
> dat2
code year month day dayofyear
1 1109619910224003 1991 2 24 55
2 1157919910102001 1991 1 2 2
3 1539820070315001 2007 3 15 74
4 1563120190907002 2019 9 7 250
Any suggestions? Thanks!
Upvotes: 0
Views: 503
Reputation: 21
You can leverage the Date
data type in R to accomplish all of these tasks. First we will parse out the date portion of the code (characters 6 to 13), and convert them to Date
format using readr::parse_date()
. Once the date is converted, we can simply access all of the values you want rather than calculating them ourselves.
library(tidyverse)
out <- dat %>%
mutate(
date=readr::parse_date(substr(code, 6, 13), format="%Y%m%d"),
day=format(date, "%d"),
month=format(date, "%m"),
year=format(date, "%Y"),
day.of.year=format(date, "%j")
)
(I'm using tidyverse syntax here because I find it quicker for these types of problems)
Once we create these columns, we can look at the updated data.frame out
:
code year month date day day.of.year
1 1109619910224003 1991 02 1991-02-24 24 055
2 1157919910102001 1991 01 1991-01-02 02 002
3 1539820070315001 2007 03 2007-03-15 15 074
4 1563120190907002 2019 09 2019-09-07 07 250
Edit: note that the output for all the new columns is character
. We can tell this without using str()
because of the leading zeros in the new columns. To get rid of this, we can do something like out <- out %>% mutate_all(as.integer)
, or just append the mutate_all
call to the end of our existing pipeline.
Upvotes: 1