user2895444
user2895444

Reputation: 69

Convert list of effective dates to start and end dates

I have a dataframe in R that looks something like this

EmployeeID Job Title    Effective Date
1          Sales1       1/1/2000
2          Sales1       3/1/2009
1          Sales2       5/2/2010
2          Management   6/1/2011

That I would like to re-format to look like this:

Employee    Job Title   Start Date  End Date
1           Sales1      1/1/2000    5/2/2010
1           Sales2      5/2/2010    
2           Sales1      3/1/2009    6/1/2011
2           Management  6/1/2011    

My real data has many more employees and titles, but it is consistent that more than one person can have the same title. I'm fine with answers using base r or any other packages.

Upvotes: 0

Views: 191

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389155

We can convert the date column into actual date, arrange data by EmployeeID and Start_date, group_by EmployeeID and create a new column taking lead value from Start_date.

library(dplyr)

df %>%
  rename(Start_date = Effective_Date) %>%
  mutate(Start_date = as.Date(Start_date, "%d/%m/%Y")) %>%
  arrange(EmployeeID, Start_date) %>%
  group_by(EmployeeID) %>%
  mutate(End_date = lead(Start_date))

#  EmployeeID Job_Title  Start_date End_date  
#       <int> <fct>      <date>     <date>    
#1          1 Sales1     2000-01-01 2010-02-05
#2          1 Sales2     2010-02-05 NA        
#3          2 Sales1     2009-01-03 2011-01-06
#4          2 Management 2011-01-06 NA       

data

df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), Job_Title = structure(c(2L, 
2L, 3L, 1L), .Label = c("Management", "Sales1", "Sales2"), class = "factor"), 
Effective_Date = structure(1:4, .Label = c("1/1/2000", "3/1/2009", 
"5/2/2010", "6/1/2011"), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L)) 

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102309

Here is a base R solution

dfout <- do.call(rbind,
        c(make.row.names = F,
          lapply(split(df,df$EmployeeID), function(v) cbind(v, End.Date = c(v[-1,3],NA)))))

such that

> dfout
  EmployeeID  Job.Title Effective.Date End.Date
1          1     Sales1       1/1/2000 5/2/2010
2          1     Sales2       5/2/2010     <NA>
3          2     Sales1       3/1/2009 6/1/2011
4          2 Management       6/1/2011     <NA>

DATA

df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), Job.Title = c("Sales1", 
"Sales1", "Sales2", "Management"), Effective.Date = c("1/1/2000", 
"3/1/2009", "5/2/2010", "6/1/2011")), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 0

sm925
sm925

Reputation: 2678

Using lead from dplyr:

library(dplyr)
 df %>%
   group_by(EmployeeID) %>%
   mutate(EndDate = ifelse(row_number() == 1, lead(as.character(EffectiveDate), 1), NA)) %>%
   arrange(EmployeeID)

# A tibble: 4 x 4
 # Groups:   EmployeeID [2]
 #   EmployeeID JobTitle   EffectiveDate EndDate 
 #<int> <fct>      <fct>         <chr>   
 #1          1 Sales1     1/1/2000      5/2/2010
 #2          1 Sales2     5/2/2010      NA      
 #3          2 Sales1     3/1/2009      6/1/2011
 #4          2 Management 6/1/2011      NA   

data

 df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), JobTitle = structure(c(2L,2L, 3L, 1L), .Label = c("Management", "Sales1", "Sales2"), class = "factor"), EffectiveDate = structure(1:4, .Label = c("1/1/2000", "3/1/2009","5/2/2010", "6/1/2011"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 0

Related Questions