Convert data frame to format for directed graph

Question

I have job history data in the form:

data.frame(id = 1:3, history = c('java dev, engineer', 'software dev, python dev', 'backend dev, programmer, consultant'))

going from most recent job to previously held jobs. I want to put it into a form where I can do directed graph analysis with 'from' and 'to' columns, for example, the first person went from engineer to java dev:

data.frame(from = c('engineer', 'python dev', 'consultant', 'programmer'), to = c('java dev', 'software dev', 'programmer', 'backend dev'))

I tried splitting the jobs on the commas into separate columns and then pivoting longer into 'id', 'job number' and 'job title' columns, but couldn't get further than that.

Ronak Shah · Accepted Answer

You can split the data on comma and reverse the list so that you get the oldest data first. Use lead to create to columns and drop NA values. I would suggest to keep id column for you to identify which value belongs to which person (id).

library(dplyr)

df %>%
  tidyr::separate_rows(history, sep = ',\s*') %>%
  group_by(id) %>%
  mutate(history = rev(history), 
         from = history, 
         to = lead(history)) %>%
  na.omit() %>%
  select(id, from, to)

#     id from       to          
#                
#1     1 engineer   java dev    
#2     2 python dev software dev
#3     3 consultant programmer  
#4     3 programmer backend dev

Convert data frame to format for directed graph

Answers (1)

Related Questions