RandomThinker
RandomThinker

Reputation: 391

Constructing a Data Structure on Transitions for Second-order Markov Chain

I have a panel data that records the employment status of individuals across different years. Many of them change jobs over the time span of my data. I want to capture these transitions and conduct second-order Markov Chain Analysis (Using the previous two transitions to predict the likelihood of next one). To do so, I need to convert my data to (1) merge the previous two transitions into string sequences and (2) put the sequence and the next transition side by side. For example:

Year Person Employment_Status
1990 Bob    High School Teacher 
1991 Bob    High School Teacher 
1992 Bob    Freelancer
1993 Bob    High School Teacher
1994 Bob    Researcher  
1990 Peter  Singer
1991 Peter  Singer
1990 John   Singer
1991 John   Dancer
1990 James  Actor
1991 James  Actor
1992 James  Producer
1993 James  Producer
1994 James  Investor

The ideal output should look like below (Note that the records of Peter and John are not included because they don't have enough previous transitions to be used. Peter has no transition and John in total has two):

From                              To
High School Teacher-Freelancer    High School Teacher
Freelancer-High School Teacher    Researcher 
Actor-Producer                    Investor

Upvotes: 0

Views: 48

Answers (1)

crcastillo
crcastillo

Reputation: 96

There are probably more efficient ways to do this, but the following should work and has the benefit of being able to follow the steps to ensure the correct output.

library(dplyr)
library(fuzzyjoin)

# Remove consecutive duplicates of preson & employment_status
clean_df <- df %>%
  arrange(
    Person, Year
  ) %>%
  mutate(
    Combo = paste(Person, Employment_Status, sep = '_')
  ) %>%
  filter(
    !dplyr::lag(Combo, n = 1, default = "1") == Combo
  ) %>%
  select(
    -Combo
  )

# Find the next employment transition
first_transform <- clean_df %>%
  fuzzyjoin::fuzzy_left_join(
    y = clean_df %>%
      select(
        Next_Year = Year
        , Person
        , Next_Employment_Status = Employment_Status
      )
    , by = c(
      "Year" = "Next_Year"
      , "Person"
      , "Employment_Status" = "Next_Employment_Status"
    )
    , match_fun = list(
      `<`
      , `==`
      , `!=`
    )
  ) %>%
  filter(
    !is.na(Next_Year)
  ) %>%
  group_by(
    Person.x, Year
  ) %>%
  rename(
    Person = Person.x
  ) %>%
  select(
    -Person.y
  ) %>%
  dplyr::slice_head(n = 1)

# Concatenate first employment transition and look for the next
first_transform %>%
  mutate(
    From = paste(Employment_Status, Next_Employment_Status, sep = "-")
  ) %>%
  left_join(
    y = first_transform %>%
      select(
        Person, Year
        , To = Next_Employment_Status
      )
    , by = c(
      "Person" = "Person"
      , "Next_Year" = "Year"
      )
  ) %>%
  ungroup() %>%
  filter(
    !is.na(To)
  ) %>%
  select(
    From, To
  )

Upvotes: 1

Related Questions