Reputation: 45
I have a data frame similar to the one below consisting of Unit and List columns. The goal is to find the matching value in [List] from [Unit], as well as the adjacent values -1 and +1 index of it. The desired output is shown in [Adjacent]. Eg in row 1, find 1 from list (1 2 3) and return (1 2).
Unit | List | Adjacent |
---|---|---|
1 | 1 2 3 | 1 2 |
10 | 10 | 10 |
20 | 10 20 30 | 10 20 30 |
2 | 1 2 | 1 2 |
3 | 1 2 3 4 | 2 3 4 |
4A | 2A 3A 4A 5A | 3A 4A 5A |
I tried to replicate the line of thinking proposed in similar questions (below) to no avail. Unfortunately, I'm still quite new to r and has yet been able to think of a solution to this.
Would appreciate some suggestions to this problem. Thanks!
Similar questions
How to find word index or position in a given string using r programming
Apply function on each cell in a column and add the result to a new column
Upvotes: 0
Views: 247
Reputation: 26218
One more option using regex
(borrowing example df by Ronak).
baseR approach
transform(df,
Adjacent = mapply(function(.x, .y) regmatches(.y,
regexpr(paste0('[^ ]* *' , .x, ' *[^ ]*'), .y)),
Unit,
List))
Unit List Adjacent
1 1 1 2 3 1 2
10 10 10 10
20 20 10 20 30 10 20 30
2 2 1 2 1 2
3 3 1 2 3 4 2 3 4
4A 4A 2A 3A 4A 5A 3A 4A 5A
Tidyverse approach
df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"),
List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")),
row.names = c(NA, -6L), class = "data.frame")
library(tidyverse)
df %>%
mutate(Adjacent = map2(Unit, List, ~regmatches(.y, regexpr(paste0('[^ ]* *' , .x, ' *[^ ]*'), .y))))
#> Unit List Adjacent
#> 1 1 1 2 3 1 2
#> 2 10 10 10
#> 3 20 10 20 30 10 20 30
#> 4 2 1 2 1 2
#> 5 3 1 2 3 4 2 3 4
#> 6 4A 2A 3A 4A 5A 3A 4A 5A
Upvotes: 3
Reputation: 887153
Using tidyverse
library(dplyr)
library(purrr)
library(stringr)
df %>%
mutate(rn = row_number()) %>%
separate_rows(List) %>%
group_by(rn) %>%
slice(match(first(Unit), List) + c(-1, 0, 1)) %>%
group_by(rn, Unit) %>%
summarise(Adjacent = str_c(List, collapse=" "), .groups = 'drop') %>%
select(-rn) %>%
right_join(df)
-output
# A tibble: 6 x 3
Unit Adjacent List
<chr> <chr> <chr>
1 1 1 2 1 2 3
2 10 10 10
3 20 10 20 30 10 20 30
4 2 1 2 1 2
5 3 2 3 4 1 2 3 4
6 4A 3A 4A 5A 2A 3A 4A 5A
df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"),
List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")),
row.names = c(NA, -6L), class = "data.frame")
Upvotes: 2
Reputation: 101628
A base R option using strsplit
+ mapply
(borrow df
from @Ronak Shah)
transform(
df,
Adjacent = mapply(
function(x, k) paste0(x[abs(seq_along(x) - which(x == k)) <= 1], collapse = " "),
strsplit(List, "\\s+"),
Unit
)
)
gives
Unit List Adjacent
1 1 1 2 3 1 2
2 10 10 10
3 20 10 20 30 10 20 30
4 2 1 2 1 2
5 3 1 2 3 4 2 3 4
6 4A 2A 3A 4A 5A 3A 4A 5A
Upvotes: 2
Reputation: 388982
You can split the data on whitespace to create list column use match
to get index of Unit
and return values at +1 and -1 position.
library(dplyr)
library(purrr)
df %>%
#you don't need strsplit if data is already a list
mutate(List = strsplit(List, '\\s+'),
Adjacent = map2(Unit, List, ~{
inds <- match(.x, .y)
na.omit(.y[c(inds-1, inds, inds+1)])
}))
# Unit List Adjacent
#1 1 1, 2, 3 1, 2
#2 10 10 10
#3 20 10, 20, 30 10, 20, 30
#4 2 1, 2 1, 2
#5 3 1, 2, 3, 4 2, 3, 4
#6 4A 2A, 3A, 4A, 5A 3A, 4A, 5A
data
df <- structure(list(Unit = c("1", "10", "20", "2", "3", "4A"),
List = c("1 2 3", "10", "10 20 30", "1 2", "1 2 3 4", "2A 3A 4A 5A")),
row.names = c(NA, -6L), class = "data.frame")
Upvotes: 2