MIGUEL
MIGUEL

Reputation: 39

Identify individual membership in string characters and create columns

I recently asked how to identify row entries according to rows that have matching characters within a string.

I have advanced with the dataset and would like to know how it'd be possible to identify a new type of matched condition for different rows.

Here is how the data looks like:

 Date         TimeObserved     IndividualsObserved      Group
 01/01/2018         09.00      Elliot, Azur, Noir       MixedGroup
 01/01/2018         09.00      Azur, Bleue, Rouge       GroupB
 02/01/2018         09.00      Chapman, PJ, Jackson     GroupA
 02/01/2018         09.00      Mississippi, Missouri    GroupC
 03/01/2018         10.00      Eben, Azur, Bleue, Noir  GroupB
 03/01/2018         10.00      Eben, Azur, Bleue        GroupB
 03/01/2018         10.00      Eben, Azur, PJ, Noir     MixedGroup
 03/01/2018         10.00      Chapman, Bowie, Shakira  GroupA
 03/01/2018         10.00      Murray, Mississippi, Mia GroupC
 04/01/2018         10.00      Mississippi, Mia         GroupC
 04/01/2018         10.00      Murray, Azur, Eben       MixedGroup
 04/01/2018         10.00      Shakira, Jackson, Bowie  GroupA
 04/01/2018         10.00      Bleue, Noir, Rouge       GroupB

I would like to know how could I obtain matches of entries that have the same date and time WHEN at least one entry is a "MixedGroup" (let's say of group A with group B) as would be the first row.

I have a list of individuals that belong to each group:

groupA = c(Chapman, PJ, Jackson, Elliot, Bowie, Shakira)
groupB = c(Azur, Bleue, Rouge, Noir, Eben)
groupC = c(Mississippi, Missouri, Murray, Mia)

Could it be possible to create columns that identify the group as Group1 and Group2, or even group3 when there are MixedGroups? Any individual of any of the three groups mixed, it does not matter whether it is one or more) with the matched group/s that is/are mixed (sometimes I don't have data for the two groups involved in a MixedGroup on the same date and hour).

Could there be a way to create a column that says "Match number" (1,2,3,....) that accounts for the number of MixedGroup situations and assigns this number not only to the mixed group but also to rows of the groups involved in the mixed group having the same date and hour?

The desired output table would look like this:

 Date         TimeObserved     IndividualsObserved      Group        Group1  Group2  MatchNumber
 01/01/2018         09.00      Elliot, Azur, Noir       MixedGroup      A      B       1
 01/01/2018         09.00      Azur, Bleue, Rouge       GroupB          B      NA      1
 02/01/2018         09.00      Chapman, PJ, Jackson     GroupA          B      NA      1  
 02/01/2018         09.00      Mississippi, Missouri    GroupC          C      NA      NA 
 03/01/2018         10.00      Rouge, Bleue, Noir       GroupB          B      NA      2  
 03/01/2018         10.00      Eben, Azur, PJ, Noir     MixedGroup      B      A       2 
 03/01/2018         10.00      Chapman, Bowie, Shakira  GroupA          A      NA      2  
 03/01/2018         10.00      Murray, Mississippi, Mia GroupC          C      NA      3     
 04/01/2018         10.00      Missouri, Azur, Eben     MixedGroup      C      B       3  
 04/01/2018         10.00      Shakira, Jackson, Bowie  GroupA          A      NA      NA
 04/01/2018         10.00      Bleue, Noir, Rouge       GroupB          B      NA      3

Thanks a lot if you have any insights,

Upvotes: 0

Views: 27

Answers (0)

Related Questions