Reputation: 39
I am having a dataset that has a variable called individuals with many options and it comes like that. I have observations for a given Day on different individuals (Individual_ID)
The different options of individuals look like this: Individual_ID("Adele", "Fitz", "Abba").... these would belong to a group that is Group=A Individual_ID("Noir", "Rouge", "Bleue").... these would belong to a group called Group=B
In some instances, the individuals from different groups, can get mixed, so we have something like this Individual_ID("Adele", "Rouge", "Bleue")... so this would represent a mixed-group,
I would like to create a variable called GroupingID that can be either GroupA, GroupB, or MixedGroup For that I do not precise that all individuals of the group are present, but instead, that the representation of the individuals is neat or not neat with respect to their group.
In order to consider a mixed grouping, any combination involving at least two individuals from different groups is sufficient.
Could someone explain me how I could apply a condition AND/OR in mutate to create a variable Grouping?
Here how my data looks like
Date IndividualsObserved
1/1/2016 Abba,Adele
2/1/2016 Adele,Fitz
3/1/2016 Fitz,Rouge,Noir
4/1/2016 Fitz,Adele,Abba
5/1/2016 Rouge,Noir,Bleue
6/1/2016 Rouge,Abba,Fitz
(the different individuals appear separated by commas in each entry cell of the column IndividualsObserved)
So I would like to have a grouping category that is able to discern whether the grouping is neat (only one group identity, or whether the grouping is composed by a mixed of individuals from different groups). It would be something like this (GroupingID)
Date IndividualsObserved GroupingID
1/1/2016 Abba,Adele GroupA
2/1/2016 Adele,Fitz GroupA
3/1/2016 Fitz,Rouge,Noir MixedGrouping
4/1/2016 Fitz,Adele,Abba GroupA
5/1/2016 Rouge,Noir,Bleue GroupB
6/1/2016 Rouge,Abba,Fitz MixedGrouping
7/1/2016 Noir,Bleue,Abba MixedGrouping
I tried this but did not work:
mutate(GroupingID = case_when(IndividualsObserved %in% c("Adele","Abba", "Fitz") ~ "GroupA",
IndividualsObserved %in% c("Noir","Bleue", "Rouge") ~ "GroupB",
TRUE ~ ToCheck))
I would appreciate any insights you may have about how to approach this using the mutate option,
I tried using dplyr function mutate
Many thanks Mark, r2evans, and hello_friend for your helpful suggestions, Indeed, it works out in the different ways you propose!
Now that I have applied this to my extensive dataset, I realise I have a few challenging cases. Perhaps you have some ideas about how to:
-consider specific individuals as "ambiguous", meaning they do not belong to any group, so they cannot be considered group A or B as they are outsiders visiting the two. Could it be possible to assign these individuals a status that does not affect the MixedGroup? If they were there, but their presence or absence did not change the overall group composition, could they have a neutral status?
-create an additional column that says GroupDetails that could be GroupA or GroupB or GroupA+GroupB attending to the list provided with the individuals
-finally, because the list has some 30000 entries, would it be possible to request with an R function to obtain all the names of IndividualsObserved (the complete list is more extensive than the one I provided as an example)?
Thanks a lot
Upvotes: 1
Views: 412
Reputation: 39
Thanks all for sharing your insights,
I realised some individuals do not belong to any group as we define group membership when individuals are established and stop migrating in and out of groups.
I would like to know how to consider some individuals as "migratory or undecided" so they have a neutral status that does not affect the original binary classification of:
i) Group A or Group B, and
ii) MixedGroup.
I complement the data example here (see dates example 7, 8, and 9 January 2016):
{Date IndividualsObserved
1/1/2016 Abba,Adele
2/1/2016 Adele,Fitz
3/1/2016 Fitz,Rouge,Noir
4/1/2016 Fitz,Adele,Abba
5/1/2016 Rouge,Noir,Bleue
6/1/2016 Rouge,Abba,Fitz}
7/1/2016 Rouge,Abba,Guacamole
8/1/2016 Fitz,Rouge,Saphir
9/1/2016 Abba,Adele,Dylan"
Where the group would be maintained as,
{"A" = c("Adele","Fitz","Abba"),
"B" = c("Rouge","Noir","Bleue"),
"Neutral" = c("Guacamole","Saphir","Dylan")}
So, the presence of "Neutral" individuals does not affect the categorisation of the collective into GroupA, GroupB, or MixedGrouping. Accordingly, these examples should be attributed to the category as follows.
{7/1/2016: Rouge,Abba,Guacamole
would be MixedGrouping (Rouge and Abba are from different groups; Guacamole is neutral)
8/1/2016: Fitz,Rouge,Saphir
would be MixedGrouping because Rouge and Fitz are from other groups; Saphir is neutral)
9/1/2016: Abba,Adele,Dylan
would be GroupA (Abba and Adele are from the same group; Dylan is neutral)}
This expands into accounting for sex ratio presence in "neat/homogenous group compositions" such as HomogeneousGrouping (GroupA or GroupB) or MixedGrouping.
I have been trying to compute this in my data, which has more than 30000 entries, but I have not found a method yet. If we have a datafile with the sex information:
{Individual GroupingID Sex
Adele GroupA F
Abba GroupA F
Fitz GroupA F
Rouge GroupB M
Noir GroupB M
Bleue GroupB F
Saphir Neutral F
Guacamole Neutral M
Dylan Neutral M}
Which approach could help compute the sex ratio considering all individuals from any GroupingID (also neutral ones here) into new columns? The sex ratio would be a score calculated by dividing total females by total males. Having two columns would be great as my ultimate interest is to compare the sex ratios and the grouping style (HomogeneousGrouping VS MixedGrouping).
{1st column: HomogeneousGroupingSexRatio (only GroupingID: A or B)
2nd column: MixedGroupingSexRatio (more than 1 GroupingID: A+B)}
Thanks a lot for sharing your thoughts!
Upvotes: 0
Reputation: 5798
Base R Solution:
# Resolve the values to classify into distinct groups;
# map_from => character vector
map_from <- c("Adele", "Fitz", "Abba", "Rouge", "Noir", "Bleue")
# Resolve the groups for each value specified above:
# map_to => character vector
map_to <- c("A", "A", "A", "B", "B", "B")
# Resolve the values to map:
# value_map => named character vector
value_map <- setNames(map_to, map_from)
# Resolve the group: GroupingID => character vector
df$GroupingID <- vapply(
# For each value in the IndividualsObserved vector:
df$IndividualsObserved,
function(x){
# For each element in the list:
ir <- lapply(
# Split the string into a list:
strsplit(x, ","),
function(y){
# Dictionary replace the values:
# character vector => env
value_map[y]
}
)
# Unlist the list into a vector:
# unlisted_ir => character vector:
unlisted_ir <- unlist(ir)
# Resolve the number of unique values:
# n_unique => integer scalar
n_unique <- length(unique(unlisted_ir))
# If there is a single group:
if(n_unique == 1){
# use the first value: character vector => env
unlisted_ir[1]
}else{
# use the default value: character vector => env
"Mixed Group"
}
},
# Explicitly define a character vector of length one
# is returned:
character(1),
# Ensure the names of the character vector aren't used:
USE.NAMES = FALSE
)
Input Data:
# Resolve the input data.frame:
# df => data.frame
df <- read.table(
text = "Date IndividualsObserved
1/1/2016 Abba,Adele
2/1/2016 Adele,Fitz
3/1/2016 Fitz,Rouge,Noir
4/1/2016 Fitz,Adele,Abba
5/1/2016 Rouge,Noir,Bleue
6/1/2016 Rouge,Abba,Fitz
7/1/2016 Noir,Bleue,Abba",
header = TRUE
)
Upvotes: 1
Reputation: 160687
Similar to Mark's answer, but after creating a list-column, we can look for all(.. %in% ..)
membership to define the groups.
quux %>%
mutate(IndividualsObserved = strsplit(IndividualsObserved, ",")) %>%
rowwise() %>%
mutate(
GroupingID = case_when(
all(IndividualsObserved %in% c("Adele","Abba", "Fitz")) ~ "GroupA",
all(IndividualsObserved %in% c("Noir","Bleue", "Rouge")) ~ "GroupB",
TRUE ~ "MixedGroup")
) %>%
ungroup()
# # A tibble: 6 × 3
# Date IndividualsObserved GroupingID
# <chr> <list> <chr>
# 1 1/1/2016 <chr [2]> GroupA
# 2 2/1/2016 <chr [2]> GroupA
# 3 3/1/2016 <chr [3]> MixedGroup
# 4 4/1/2016 <chr [3]> GroupA
# 5 5/1/2016 <chr [3]> GroupB
# 6 6/1/2016 <chr [3]> MixedGroup
I'm generally not a fan of doing things rowwise()
, but it works well-enough here and is unlikely to be a performance problem unless your real data is fairly large.
Data
quux <- structure(list(Date = c("1/1/2016", "2/1/2016", "3/1/2016", "4/1/2016", "5/1/2016", "6/1/2016"), IndividualsObserved = c("Abba,Adele", "Adele,Fitz", "Fitz,Rouge,Noir", "Fitz,Adele,Abba", "Rouge,Noir,Bleue", "Rouge,Abba,Fitz")), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1
Reputation: 12558
Steps:
library(tidyverse)
groups <- list("A" = c("Adele", "Fitz", "Abba"),
"B" = c("Rouge", "Noir", "Bleue"))
df |>
mutate(IndividualsObserved = str_split(IndividualsObserved, ","),
Group = map_chr(IndividualsObserved, \(x) {
a <- any(x %in% groups$A)
b <- any(x %in% groups$B)
case_when(a & b ~ "MixedGrouping",
a ~ "GroupA",
b ~ "GroupB",
TRUE ~ "None")}))
Output:
Date IndividualsObserved Group
1 1/1/2016 Abba, Adele GroupA
2 2/1/2016 Adele, Fitz GroupA
3 3/1/2016 Fitz, Rouge, Noir MixedGrouping
4 4/1/2016 Fitz, Adele, Abba GroupA
5 5/1/2016 Rouge, Noir, Bleue GroupB
6 6/1/2016 Rouge, Abba, Fitz MixedGrouping
7 7/1/2016 Noir, Bleue, Abba MixedGrouping
There's many other ways you could do this, e.g. making a dataframe of groups with their corresponding individuals, separating each individual in df
into it's own row, and doing a join, to give but one way, but this is the most straightforward in my opinion.
Data:
df <- read.table(text=
"Date IndividualsObserved
1/1/2016 Abba,Adele
2/1/2016 Adele,Fitz
3/1/2016 Fitz,Rouge,Noir
4/1/2016 Fitz,Adele,Abba
5/1/2016 Rouge,Noir,Bleue
6/1/2016 Rouge,Abba,Fitz
7/1/2016 Noir,Bleue,Abba", header = T)
Upvotes: 4