Reputation: 49
I have a df(Below) and a vector as c("B", "F"), I want to first split the df as list based on ID and then if any of the values in column "Name" matches in the vector c("B", "F") then assign column "Final" with that particular value in the vector for all rows.
ID Name
1 A
1 B
2 C
1 D
2 E
2 F
3 C
what I have tried till now is below :
df_list <- dlply(df, "ID")
df_list_2 <- lapply(df_list, transform,
Final = case_when(
sum(str_count(grepl(Name, "B"))) >= 1 ~ "B",
sum(str_count(grepl(Name, "F"))) >= 1 ~ "F",
TRUE ~ "No"))
My Final result should be as below
List 1 :
ID Name Final
1 A B
1 B B
1 D B
List 2:
ID Name Final
2 C F
2 E F
2 F F
List 3 :
ID Name Final
3 C NO
This is just a sample data, I have to run this on million of records with the vector list containing about 20 string values
Upvotes: 1
Views: 1802
Reputation: 51582
Here is a base R idea based on your description,
lapply(split(df, df$ID), function(i) {
i1 <- i$Name[i$Name %in% v1];
data.frame(i, Final = replace(i1, length(i1) == 0, 'NO'))
})
which gives,
$`1`
ID Name Final
1 1 A B
2 1 B B
4 1 D B
$`2`
ID Name Final
3 2 C F
5 2 E F
6 2 F F
$`3`
ID Name Final
7 3 C NO
DATA:
dput(df)
structure(list(ID = c(1L, 1L, 2L, 1L, 2L, 2L, 3L), Name = c("A",
"B", "C", "D", "E", "F", "C")), row.names = c(NA, -7L), class = "data.frame")
dput(v1)
c("B", "F")
EDIT: If you have more than 1 final elements, then you can convert to strings, i.e.
lapply(split(df, df$ID), function(i) {i1 <- i$Name[i$Name %in% v1];
data.frame(i, Final = ifelse(length(unique(i1)) > 1,
toString(unique(i1)), ifelse(length(unique(i1)) == 0, 'NO', i1)))})
Upvotes: 2
Reputation: 7724
You can also use group_by
from dplyr
:
library(plyr) # Load plyr first if you use it with dplyr
library(dplyr)
match_vector <- c("B", "F")
df_new <- df %>%
group_by(ID) %>%
mutate(Final = if_else(any(Name %in% match_vector),
paste(match_vector[match_vector %in% Name], collapse = ";"),
"No"))
df_new
# A tibble: 7 x 3
# Groups: ID [3]
# ID Name Final
# <int> <chr> <chr>
# 1 1 A B
# 2 1 B B
# 3 2 C F
# 4 1 D B
# 5 2 E F
# 6 2 F F
# 7 3 C No
I used paste(match_vector[match_vector %in% Name], collapse = ";")
here, which would print all matches in the Final column, if there are several. If you don't want this use match_vector[match_vector %in% Name][1]
to get the first match.
If you need the list-structure you can then use split
(base R) or dlply
(plyr
):
dlply(df_new, "ID")
split(df_new, df_new$ID)
$`1`
# A tibble: 3 x 3
# Groups: ID [1]
ID Name Final
<int> <chr> <chr>
1 1 A B
2 1 B B
3 1 D B
$`2`
# A tibble: 3 x 3
# Groups: ID [1]
ID Name Final
<int> <chr> <chr>
1 2 C F
2 2 E F
3 2 F F
$`3`
# A tibble: 1 x 3
# Groups: ID [1]
ID Name Final
<int> <chr> <chr>
1 3 C No
Upvotes: 1