Reputation: 391
I have a panel data that records the employment status of individuals across different years. Many of them change jobs over the time span of my data. I want to capture these transitions and conduct second-order Markov Chain Analysis (Using the previous two transitions to predict the likelihood of next one). To do so, I need to convert my data to (1) merge the previous two transitions into string sequences and (2) put the sequence and the next transition side by side. For example:
Year Person Employment_Status
1990 Bob High School Teacher
1991 Bob High School Teacher
1992 Bob Freelancer
1993 Bob High School Teacher
1994 Bob Researcher
1990 Peter Singer
1991 Peter Singer
1990 John Singer
1991 John Dancer
1990 James Actor
1991 James Actor
1992 James Producer
1993 James Producer
1994 James Investor
The ideal output should look like below (Note that the records of Peter and John are not included because they don't have enough previous transitions to be used. Peter has no transition and John in total has two):
From To
High School Teacher-Freelancer High School Teacher
Freelancer-High School Teacher Researcher
Actor-Producer Investor
Upvotes: 0
Views: 48
Reputation: 96
There are probably more efficient ways to do this, but the following should work and has the benefit of being able to follow the steps to ensure the correct output.
library(dplyr)
library(fuzzyjoin)
# Remove consecutive duplicates of preson & employment_status
clean_df <- df %>%
arrange(
Person, Year
) %>%
mutate(
Combo = paste(Person, Employment_Status, sep = '_')
) %>%
filter(
!dplyr::lag(Combo, n = 1, default = "1") == Combo
) %>%
select(
-Combo
)
# Find the next employment transition
first_transform <- clean_df %>%
fuzzyjoin::fuzzy_left_join(
y = clean_df %>%
select(
Next_Year = Year
, Person
, Next_Employment_Status = Employment_Status
)
, by = c(
"Year" = "Next_Year"
, "Person"
, "Employment_Status" = "Next_Employment_Status"
)
, match_fun = list(
`<`
, `==`
, `!=`
)
) %>%
filter(
!is.na(Next_Year)
) %>%
group_by(
Person.x, Year
) %>%
rename(
Person = Person.x
) %>%
select(
-Person.y
) %>%
dplyr::slice_head(n = 1)
# Concatenate first employment transition and look for the next
first_transform %>%
mutate(
From = paste(Employment_Status, Next_Employment_Status, sep = "-")
) %>%
left_join(
y = first_transform %>%
select(
Person, Year
, To = Next_Employment_Status
)
, by = c(
"Person" = "Person"
, "Next_Year" = "Year"
)
) %>%
ungroup() %>%
filter(
!is.na(To)
) %>%
select(
From, To
)
Upvotes: 1