tauft
tauft

Reputation: 575

Convert data frame to format for directed graph

I have job history data in the form:

data.frame(id = 1:3, history = c('java dev, engineer', 'software dev, python dev', 'backend dev, programmer, consultant'))

going from most recent job to previously held jobs. I want to put it into a form where I can do directed graph analysis with 'from' and 'to' columns, for example, the first person went from engineer to java dev:

data.frame(from = c('engineer', 'python dev', 'consultant', 'programmer'), to = c('java dev', 'software dev', 'programmer', 'backend dev'))

I tried splitting the jobs on the commas into separate columns and then pivoting longer into 'id', 'job number' and 'job title' columns, but couldn't get further than that.

Upvotes: 0

Views: 52

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388962

You can split the data on comma and reverse the list so that you get the oldest data first. Use lead to create to columns and drop NA values. I would suggest to keep id column for you to identify which value belongs to which person (id).

library(dplyr)

df %>%
  tidyr::separate_rows(history, sep = ',\\s*') %>%
  group_by(id) %>%
  mutate(history = rev(history), 
         from = history, 
         to = lead(history)) %>%
  na.omit() %>%
  select(id, from, to)

#     id from       to          
#  <int> <chr>      <chr>       
#1     1 engineer   java dev    
#2     2 python dev software dev
#3     3 consultant programmer  
#4     3 programmer backend dev 

Upvotes: 1

Related Questions