Adam
Adam

Reputation: 107

How to add information to a cell based on information in other cells in the column using R?

I have a very messy dataframe that looks like

df <- data.frame(Job = c("casual", "part time", "full time", "Level A total" , "casual","full time","Level B total"), institute1 = c(1,2,2,5,0,1,1))

Where the rows above "Level B total" refer to level B, until going up the rows you get to "level A total" where it now refers to level A. The data is >500 lines long so manually cleaning it is an option but an unpleasant one, but I cant think of how to code it so I can add information so R knows what Level each cell is referring to.

Upvotes: 0

Views: 89

Answers (2)

hello_friend
hello_friend

Reputation: 5798

Base R solution:

transform(within(df[rev(seq_len(nrow(df))),], 
           {
             Level <- ifelse(grepl("Level\\s*[A-Z]", Job), 
                             gsub("\\s*total", "", Job), NA_character_)
            }
          ), Level = na.omit(Level)[cumsum(!(is.na(Level)))])[rev(seq_len(nrow(df))),]

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

We can create a new column Level and put all the "Level" values in it. fill the NA values with the non-NA value below it. Clean up the Level column by adding text from Job.

library(dplyr)

df %>%
  mutate(Level = replace(Job, !grepl('Level', Job), NA)) %>%
  tidyr::fill(Level, .direction = 'up')  %>%
  mutate(Level = ifelse(grepl('total', Job), 
                        Job, paste0(sub('total', '', Level), Job)))

#            Job institute1             Level
#1        casual          1    Level A casual
#2     part time          2 Level A part time
#3     full time          2 Level A full time
#4 Level A total          5     Level A total
#5        casual          0    Level B casual
#6     full time          1 Level B full time
#7 Level B total          1     Level B total

Upvotes: 2

Related Questions