Reputation: 575
I have job history data in the form:
data.frame(id = 1:3, history = c('java dev, engineer', 'software dev, python dev', 'backend dev, programmer, consultant'))
going from most recent job to previously held jobs. I want to put it into a form where I can do directed graph analysis with 'from' and 'to' columns, for example, the first person went from engineer to java dev:
data.frame(from = c('engineer', 'python dev', 'consultant', 'programmer'), to = c('java dev', 'software dev', 'programmer', 'backend dev'))
I tried splitting the jobs on the commas into separate columns and then pivoting longer into 'id', 'job number' and 'job title' columns, but couldn't get further than that.
Upvotes: 0
Views: 52
Reputation: 388962
You can split the data on comma and reverse the list so that you get the oldest data first. Use lead
to create to
columns and drop NA
values.
I would suggest to keep id
column for you to identify which value belongs to which person (id
).
library(dplyr)
df %>%
tidyr::separate_rows(history, sep = ',\\s*') %>%
group_by(id) %>%
mutate(history = rev(history),
from = history,
to = lead(history)) %>%
na.omit() %>%
select(id, from, to)
# id from to
# <int> <chr> <chr>
#1 1 engineer java dev
#2 2 python dev software dev
#3 3 consultant programmer
#4 3 programmer backend dev
Upvotes: 1