Rwak
Rwak

Reputation: 326

R group pairs of rows in a dataframe based on true/false function

I need to create groups of rows from my dataframe using a custom function as grouping criteria. That function would compare two pairs of rows and returns true/false if those rows should be grouped together.

In an example dataset like:

id   field        code1  code2
1    textField1   055    066
2    textField2   100    120
3    textField3   300    350
4    textField4   800    450
5    textField5   460    900
6    textField6   490    700

                         ...

The function checks certain rules between the row fields by pair (function(row1,row2)) and returns TRUE / FALSE if those rows should be together.

I need to apply that function to all posible pairs in the dataframe and generate a list (or other structure) with all ID that matched to be together.

One way to apply the function to each pair is shown in this answer :

lapply(seq_len(nrow(df) - 1),
       function(i){
         customFunction( df[i,], df[i+1,] )
       })

But I cannot think a way to group the rows that got TRUE as result

EDIT: Re-reading my question, seems in the need of an example:

If we created a matrix with all the posible combinations, the result would be:

      [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,]  TRUE   FALSE  FALSE  FALSE  FALSE  FALSE
[2,]  FALSE  TRUE   TRUE   TRUE   FALSE  FALSE
[3,]  FALSE  TRUE   TRUE   FALSE  FALSE  FALSE
[4,]  FALSE  TRUE   FALSE  TRUE   FALSE  FALSE
[5,]  FALSE  FALSE  FALSE  FALSE  TRUE   TRUE
[6,]  FALSE  FALSE  FALSE  FALSE  TRUE   TRUE

The resulting groups would be then:

1
2,3,4
5,6

Upvotes: 2

Views: 1321

Answers (1)

Jthorpe
Jthorpe

Reputation: 10167

Here's a function that does what you've specified:

mx <- matrix(c( TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,
FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,
FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,
FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE),6)


groupings <- function(mx){

    out <- list()
    while(dim(mx)[1]){
        # get the groups that match the first column
        g = which(mx[,1])

        # expand the selection to any columns for which 
        # there is match in the first row
        (expansion = which(apply(cbind(mx[,g]),1,any)))
        while(length(expansion) > length(g)){
            g = expansion

            # expand the selection to any columns for which 
            # there is match to the current group
            expansion = which(apply(cbind(mx[,g]),1,any))
        }

        out <- c(out,list(g))
        mx <- mx[-g,-g]
    }
    return(out)

}

groupings(mx)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2 3
#> 
#> [[3]]
#> [1] 1 2

Upvotes: 1

Related Questions