Robin B
Robin B

Reputation: 45

Finding adjacent values in a list

I have a data frame similar to the one below consisting of Unit and List columns. The goal is to find the matching value in [List] from [Unit], as well as the adjacent values -1 and +1 index of it. The desired output is shown in [Adjacent]. Eg in row 1, find 1 from list (1 2 3) and return (1 2).

Unit List Adjacent
1 1 2 3 1 2
10 10 10
20 10 20 30 10 20 30
2 1 2 1 2
3 1 2 3 4 2 3 4
4A 2A 3A 4A 5A 3A 4A 5A

I tried to replicate the line of thinking proposed in similar questions (below) to no avail. Unfortunately, I'm still quite new to r and has yet been able to think of a solution to this.

Would appreciate some suggestions to this problem. Thanks!

Similar questions

How to find word index or position in a given string using r programming

Apply function on each cell in a column and add the result to a new column

Upvotes: 0

Views: 247

Answers (4)

AnilGoyal
AnilGoyal

Reputation: 26218

One more option using regex (borrowing example df by Ronak).

baseR approach

transform(df, 
          Adjacent = mapply(function(.x, .y) regmatches(.y, 
                                                        regexpr(paste0('[^ ]* *' , .x, ' *[^ ]*'), .y)), 
                            Unit, 
                            List))

   Unit        List Adjacent
1     1       1 2 3      1 2
10   10          10       10
20   20    10 20 30 10 20 30
2     2         1 2      1 2
3     3     1 2 3 4    2 3 4
4A   4A 2A 3A 4A 5A 3A 4A 5A

Tidyverse approach

df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"), 
                     List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")), 
                row.names = c(NA, -6L), class = "data.frame")

library(tidyverse)

df %>%
  mutate(Adjacent = map2(Unit, List, ~regmatches(.y, regexpr(paste0('[^ ]* *' , .x, ' *[^ ]*'), .y))))
#>   Unit        List Adjacent
#> 1    1       1 2 3      1 2
#> 2   10          10       10
#> 3   20    10 20 30 10 20 30
#> 4    2         1 2      1 2
#> 5    3     1 2 3 4    2 3 4
#> 6   4A 2A 3A 4A 5A 3A 4A 5A

Upvotes: 3

akrun
akrun

Reputation: 887153

Using tidyverse

library(dplyr)
library(purrr)
library(stringr)
df %>% 
   mutate(rn = row_number()) %>% 
   separate_rows(List) %>% 
   group_by(rn) %>%
   slice(match(first(Unit), List) + c(-1, 0, 1)) %>%
   group_by(rn, Unit) %>%
   summarise(Adjacent = str_c(List, collapse=" "), .groups = 'drop') %>% 
   select(-rn) %>% 
   right_join(df)

-output

# A tibble: 6 x 3
  Unit  Adjacent List       
  <chr> <chr>    <chr>      
1 1     1 2      1 2 3      
2 10    10       10         
3 20    10 20 30 10 20 30   
4 2     1 2      1 2        
5 3     2 3 4    1 2 3 4    
6 4A    3A 4A 5A 2A 3A 4A 5A

data

df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"), 
List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")), 
row.names = c(NA, -6L), class = "data.frame")

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101628

A base R option using strsplit + mapply (borrow df from @Ronak Shah)

transform(
  df,
  Adjacent = mapply(
    function(x, k) paste0(x[abs(seq_along(x) - which(x == k)) <= 1], collapse = " "),
    strsplit(List, "\\s+"),
    Unit
  )
)

gives

  Unit        List Adjacent
1    1       1 2 3      1 2
2   10          10       10
3   20    10 20 30 10 20 30
4    2         1 2      1 2
5    3     1 2 3 4    2 3 4
6   4A 2A 3A 4A 5A 3A 4A 5A

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You can split the data on whitespace to create list column use match to get index of Unit and return values at +1 and -1 position.

library(dplyr)
library(purrr)

df %>%
  #you don't need strsplit if data is already a list
  mutate(List = strsplit(List, '\\s+'),       
         Adjacent = map2(Unit, List, ~{
           inds <- match(.x, .y)
           na.omit(.y[c(inds-1, inds, inds+1)])
         }))

#  Unit           List   Adjacent
#1    1        1, 2, 3       1, 2
#2   10             10         10
#3   20     10, 20, 30 10, 20, 30
#4    2           1, 2       1, 2
#5    3     1, 2, 3, 4    2, 3, 4
#6   4A 2A, 3A, 4A, 5A 3A, 4A, 5A

data

df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"), 
List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")), 
row.names = c(NA, -6L), class = "data.frame")

Upvotes: 2

Related Questions