Reputation: 69
I have a dataframe in R that looks something like this
EmployeeID Job Title Effective Date
1 Sales1 1/1/2000
2 Sales1 3/1/2009
1 Sales2 5/2/2010
2 Management 6/1/2011
That I would like to re-format to look like this:
Employee Job Title Start Date End Date
1 Sales1 1/1/2000 5/2/2010
1 Sales2 5/2/2010
2 Sales1 3/1/2009 6/1/2011
2 Management 6/1/2011
My real data has many more employees and titles, but it is consistent that more than one person can have the same title. I'm fine with answers using base r or any other packages.
Upvotes: 0
Views: 191
Reputation: 389155
We can convert the date column into actual date, arrange
data by EmployeeID
and Start_date
, group_by
EmployeeID
and create a new column taking lead
value from Start_date
.
library(dplyr)
df %>%
rename(Start_date = Effective_Date) %>%
mutate(Start_date = as.Date(Start_date, "%d/%m/%Y")) %>%
arrange(EmployeeID, Start_date) %>%
group_by(EmployeeID) %>%
mutate(End_date = lead(Start_date))
# EmployeeID Job_Title Start_date End_date
# <int> <fct> <date> <date>
#1 1 Sales1 2000-01-01 2010-02-05
#2 1 Sales2 2010-02-05 NA
#3 2 Sales1 2009-01-03 2011-01-06
#4 2 Management 2011-01-06 NA
data
df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), Job_Title = structure(c(2L,
2L, 3L, 1L), .Label = c("Management", "Sales1", "Sales2"), class = "factor"),
Effective_Date = structure(1:4, .Label = c("1/1/2000", "3/1/2009",
"5/2/2010", "6/1/2011"), class = "factor")), class = "data.frame",
row.names = c(NA, -4L))
Upvotes: 1
Reputation: 102309
Here is a base R solution
dfout <- do.call(rbind,
c(make.row.names = F,
lapply(split(df,df$EmployeeID), function(v) cbind(v, End.Date = c(v[-1,3],NA)))))
such that
> dfout
EmployeeID Job.Title Effective.Date End.Date
1 1 Sales1 1/1/2000 5/2/2010
2 1 Sales2 5/2/2010 <NA>
3 2 Sales1 3/1/2009 6/1/2011
4 2 Management 6/1/2011 <NA>
DATA
df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), Job.Title = c("Sales1",
"Sales1", "Sales2", "Management"), Effective.Date = c("1/1/2000",
"3/1/2009", "5/2/2010", "6/1/2011")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 0
Reputation: 2678
Using lead
from dplyr
:
library(dplyr)
df %>%
group_by(EmployeeID) %>%
mutate(EndDate = ifelse(row_number() == 1, lead(as.character(EffectiveDate), 1), NA)) %>%
arrange(EmployeeID)
# A tibble: 4 x 4
# Groups: EmployeeID [2]
# EmployeeID JobTitle EffectiveDate EndDate
#<int> <fct> <fct> <chr>
#1 1 Sales1 1/1/2000 5/2/2010
#2 1 Sales2 5/2/2010 NA
#3 2 Sales1 3/1/2009 6/1/2011
#4 2 Management 6/1/2011 NA
df <- structure(list(EmployeeID = c(1L, 2L, 1L, 2L), JobTitle = structure(c(2L,2L, 3L, 1L), .Label = c("Management", "Sales1", "Sales2"), class = "factor"), EffectiveDate = structure(1:4, .Label = c("1/1/2000", "3/1/2009","5/2/2010", "6/1/2011"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 0