Fosulli
Fosulli

Reputation: 13

Splitting numeric column data by given number of characters

I am trying to split one column into three columns so I can give a date format. Currently the data set looks like this

YYYYMMDD         Number
20020101         0.21
20020102         0.34
20020103         1.22

I want it to look like this

Year    Month  Day  Number
2002    01     01   0.21
2002    01     02   0.34
2002    01     03   1.22

I have the following code written and it works in the sense that i can split up the column, but in doing so I create new data frames and I am unsure how to then add back in the data.frame to the original data.set

Is there a nicer way to do it? or how to do I get new2 + new to combine with data?

res <- strsplit(data$YYYYMMDD, "(?<=.{4})" , perl = TRUE)
new<-do.call(rbind, res)
summary(new)
colnames(new)<-c("Year", "MMDD")
new<-as.data.frame(new)
new$MMDD<-as.character(new$MMDD)
res <- strsplit(new$MMDD, "(?<=.{2})" , perl = TRUE)
new2<-do.call(rbind, res)
summary(new2)
colnames(new2)<-c("Month", "Dom")
new2<-as.data.frame(new2)

Upvotes: 0

Views: 315

Answers (4)

yeedle
yeedle

Reputation: 5008

You can do it with lubridate like so:


library(tidyverse)
library(lubridate)

data %>% 
  mutate(
    YYYYMMDD = as.Date(as.character(YYYYMMDD), format = "%Y%m%d"),
    year = year(YYYYMMDD),
    month = month(YYYYMMDD),
    day = mday(YYYYMMDD)
    ) 
#>     YYYYMMDD Number year month day
#> 1 2002-01-01   0.21 2002     1   1
#> 2 2002-01-02   0.34 2002     1   2
#> 3 2002-01-03   1.22 2002     1   3

Upvotes: 0

MBnnn
MBnnn

Reputation: 308

You can try this (with your variable YYYYMMDD as character):

year = substr(data$YYYYMMDD,1,4)
month = substr(data$YYYYMMDD,5,6)
day = substr(data$YYYYMMDD,7,8)

new_data = as.data.frame(cbind(year,month,day,data$Number))
colnames(new_data)[4] = "Number"

Upvotes: 1

mt1022
mt1022

Reputation: 17289

With substring:

x <- mapply(substring, c(1, 5, 7), c(4, 6, 8),
            MoreArgs = list(text = df$YYYYMMDD), SIMPLIFY = F)
names(x) <- c('Year', 'Month', 'Day')
cbind(as.data.frame(x), df[-1])
#   Year Month Day Number
# 1 2002    01  01   0.21
# 2 2002    01  02   0.34
# 3 2002    01  03   1.22

Upvotes: 2

akrun
akrun

Reputation: 886998

We can do this easily with separate

library(tidyr)
separate(df1, YYYYMMDD, into = c('Year', 'Month', 'Day'), sep=c(4, 6))
#   Year Month Day Number
#1 2002    01  01   0.21
#2 2002    01  02   0.34
#3 2002    01  03   1.22

Upvotes: 1

Related Questions