fastlanes
fastlanes

Reputation: 333

How to split each value using delimiter in each column for a specific row in a dataframe

I have a data frame where there are specific rows that are in error meaning two rows have been combined into one indicated by whitespaces. I have the indices of those rows where the error(s) are and am looking to build a function using the indices to simply create either two new lists that I can then insert into the data frame (as 2 rows) and replace the incorrect row.

I have the following dataframe in R (row 3 is an error):

   Location  V1  V2  V3  V4
1       nyc   2  20   9   2
2       bos   4   5   3   8
3 atl m dal 1 4 3 4 1 3 5 2
4       mia   3   1   8   4

structure(list(Location = c("nyc", "bos", "atl m dal", "mia"), 
    V1 = c("2", "4", "1 4", "3"), V2 = c("20", "5", "3 4", "1"
    ), V3 = c("9", "3", "1 3", "8"), V4 = c("2", "8", "5 2", 
    "4")), row.names = c(NA, 4L), class = "data.frame")

Step 1: The desired output would be two lists/vectors (for each error row) as such:

atl m
1
3
1
5

dal 
4
4
3
2

Step 2: Once I have those lists, I should be able to insert/add them to the existing dataframe as rows (and delete the error rows) as such:

  Location V1 V2 V3 V4
1      nyc  2 20  9  2
2      bos  4  5  3  8
3    atl m  1  3  1  5
4      dal  4  4  3  2
5      mia  3  1  8  4

Looking to get help with step 1 mostly. All columns (V1...V4) except the first have exactly one whitespace which can be used as the delimiter to split each value making it easy but the first column location isn't as simple. I will need to subset the string manually for that column based on another list I can match it to.

Upvotes: 0

Views: 250

Answers (2)

andrew_reece
andrew_reece

Reputation: 21264

It's a little crude, but you can just copy the problem row, and take the first half of the " " split as the first row ("atl m") and then second half as the "dal" row:

tibble(df) %>%
  mutate(across(.fns = ~str_replace(., " \\w+$", ""))) %>%
  add_row(
    df %>% 
      slice(3) %>%
      mutate(across(.fns = ~str_replace(., "^.* (\\w+)$", "\\1")))
  )

# A tibble: 5 x 5
  Location V1    V2    V3    V4   
  <chr>    <chr> <chr> <chr> <chr>
1 nyc      2     20    9     2    
2 bos      4     5     3     8    
3 atl m    1     3     1     5    
4 mia      3     1     8     4    
5 dal      4     4     3     2    

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can identify the rows that have error which have whitespace (\\s) in them. Create a separate dataframe of such rows. Getting the location is not straight-forward since we have lot of whitespace in them. The logic which I have used is the last word goes into the new row while everything before it stays in the same row.

Finally, we combine the two dataframes to get one complete dataframe.

library(dplyr)
inds <- grep('\\s', df$V1)

tmp <- df[inds, ]

tmp %>%
  tidyr::separate_rows(everything(), sep = '(\\s)(?!.*\\s)') %>%
  bind_rows(df[-inds, ]) %>%
  type.convert(as.is = TRUE)

#  Location    V1    V2    V3    V4
#  <chr>    <int> <int> <int> <int>
#1 atl m        1     3     1     5
#2 dal          4     4     3     2
#3 nyc          2    20     9     2
#4 bos          4     5     3     8
#5 mia          3     1     8     4

Upvotes: 1

Related Questions