Reputation: 39
I have a dataset with observations on individuals that have clear group membership, and sometimes, in that observation that is filled in the column "individuals", I have some individuals that get mixed with those that have clear membership. I would like to get rid of these individuals but when I did it I ended having "" instead of their names. Could anyone help me remove their names and leave no trace in the string of characters that compose the column of individuals?
These individuals beling to the following groups:
groupA = c("Noir", "Bleue", "Rouge")
groupB = c("Dion", "Saphir", "Chapman")
groupC= c("Murray", "Nile", "Mississippi")
My data looks like this:
group date time individuals
A 1/1/2016 9:00 "Noir", "Bleue", "Rouge"
B 1/1/2016 9:00 "Dion", "Saphir", "Chapman"
C 1/1/2016 9:00 "Murray", "Nile", Mississippi"
These cases are OK because the individuals are belonging to the group, but sometimes, I have some extra individuals that have no group membership that are interspersed with the groups that do have clear membership, like this:
My data looks like this, where 3 individuals that are unknown (InconnuA, InconnuB, Inconnu1) are mixed.
group date time individuals
A 2/1/2016 9:00 "Noir", "Bleue", "InconnuA"
B 2/1/2016 9:00 "Dion", "Saphir", "InconnuB"
C 2/1/2016 9:00 "Murray", "Nile", Inconnu1"
I would like to remove the individuals, and the function below works well, but then, in the dataset that results from it I have undesired "" in the place where the unknown individuals I wanted to remove were.
IndividualsRemoved <- partycompfocal_GroupingID %>%
mutate(across("individuals", str_replace, "InconnuA", ""),
across("individuals", str_replace, "InconnuB", ""),
across("individuals", str_replace, "Inconnu1", ""),
across("individuals", str_replace, "Inconnu2", ""),
across("individuals", str_replace, "Inconnu3", ""),
)
So in my datafile after the change I would have this:
group date time individuals
A 2/1/2016 9:00 "Noir", "Bleue", ""
B 2/1/2016 9:00 "Dion", "Saphir", ""
C 2/1/2016 9:00 "Murray", "Nile", "
Could anyone help me remove the "" from the column individuals so it looks like this in the end?
group date time individuals
A 2/1/2016 9:00 "Noir", "Bleue"
B 2/1/2016 9:00 "Dion", "Saphir"
C 2/1/2016 9:00 "Murray", "Nile"
Many thanks
Upvotes: 1
Views: 62
Reputation: 12558
From what I gather from the question, you are trying to remove names from long strings of individuals (?) which are not a part of predefined vectors using stringr, and not keep around things like "". There are a couple of approaches you could take:
map()
, orseparate_longer()
, then use filter to accomplish the same result.Below is the first option:
library(tidyverse)
df <- tibble(
group = c("A", "B", "C"),
date = as.Date(c("1/1/2016", "1/1/2016", "1/1/2016"), format = "%d/%m/%Y"), # unclear if you are using day month year, or month day year
time = hms(paste0(c("9:00", "9:00", "9:00"), ":00")),
individuals = c('"Noir", "Bleue", "InconnuA"',
'"Dion", "Saphir", "InconnuB"',
'"Murray", "Nile", "InconnuC"')) # note that each row's value is a string
group_df <- tibble(group = c("A", "B", "C"), individuals = list(groupA, groupB, groupC))
df |>
mutate(individuals = str_extract(individuals, "(?<=^\").+(?=\"$)") |> str_split("\", \"")) |> # basically, remove the first and last apostrophes, and then split on '", "'
left_join(group_df, by = "group") |>
mutate(
individuals = map2(individuals.x, individuals.y, ~ .x[.x %in% .y])) |>
select(-individuals.x, -individuals.y)
Output:
# A tibble: 3 × 4
group date time individuals
<chr> <date> <Period> <list>
1 A 2016-01-01 9H 0M 0S <chr [2]>
2 B 2016-01-01 9H 0M 0S <chr [2]>
3 C 2016-01-01 9H 0M 0S <chr [2]>
Upvotes: 2