How to add information to a cell based on information in other cells in the column using R?

Question

I have a very messy dataframe that looks like

df <- data.frame(Job = c("casual", "part time", "full time", "Level A total" , "casual","full time","Level B total"), institute1 = c(1,2,2,5,0,1,1))

Where the rows above "Level B total" refer to level B, until going up the rows you get to "level A total" where it now refers to level A. The data is >500 lines long so manually cleaning it is an option but an unpleasant one, but I cant think of how to code it so I can add information so R knows what Level each cell is referring to.

Ronak Shah · Accepted Answer

We can create a new column Level and put all the "Level" values in it. fill the NA values with the non-NA value below it. Clean up the Level column by adding text from Job.

library(dplyr)

df %>%
  mutate(Level = replace(Job, !grepl('Level', Job), NA)) %>%
  tidyr::fill(Level, .direction = 'up')  %>%
  mutate(Level = ifelse(grepl('total', Job), 
                        Job, paste0(sub('total', '', Level), Job)))

#            Job institute1             Level
#1        casual          1    Level A casual
#2     part time          2 Level A part time
#3     full time          2 Level A full time
#4 Level A total          5     Level A total
#5        casual          0    Level B casual
#6     full time          1 Level B full time
#7 Level B total          1     Level B total

How to add information to a cell based on information in other cells in the column using R?

Answers (2)

Related Questions