JPashs
JPashs

Reputation: 13896

Generate all possible combinations for Columns(cross join or Cartesian product)

I have a Google SpreadSheets doc with three columns A, B and C.

I need to populate the Column C with all the possible combinations of the values in Columns A and B. Please take a look a the capture to see what I mean.

I found this to be done in Excel, here, but it doesn't work in google spreadsheets.

The formula should be useful even for more columns (e.g.: four instead of two)

Can I do this?

enter image description here

Upvotes: 39

Views: 77894

Answers (8)

player0
player0

Reputation: 1

year 2024 C.E.

in new AI world era, we can solve this with:

=INDEX(TOCOL(A2:A3&" "&TOROW(B2:B4)))

enter image description here

to account for future expansion we can do:

=INDEX(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1)))

enter image description here

for 3 columns:

=INDEX(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
                                 &" "&TOROW(C2:C, 1)))

enter image description here

4 columns:

=INDEX(TOCOL(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
                                       &" "&TOROW(C2:C, 1))
                                       &" "&TOROW(D2:D, 1)))

enter image description here

5 columns:

=INDEX(TOCOL(TOCOL(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
                                             &" "&TOROW(C2:C, 1))
                                             &" "&TOROW(D2:D, 1))
                                             &" "&TOROW(E2:E, 1)))

enter image description here

to split it into multiple columns (not that OP asked for this tho):

=INDEX(SPLIT(TOCOL(TOCOL(TOCOL(TOCOL(
 TOCOL(A2:A, 1) &"×"&
 TOROW(B2:B, 1))&"×"&
 TOROW(C2:C, 1))&"×"&
 TOROW(D2:D, 1))&"×"&
 TOROW(E2:E, 1)),"×"))

enter image description here




year 2021 C.E.

in post-pandemic new world we can solve this with:

=INDEX(FLATTEN(A2:A3&" "&TRANSPOSE(B2:B4)))

enter image description here

to account for future expansion we can do:

=INDEX(FLATTEN(FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(FILTER(B2:B; B2:B<>""))))

enter image description here

for 3 columns:

=INDEX(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>""))))

enter image description here

4 columns:

=INDEX(FLATTEN(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D; D2:D<>""))))

enter image description here



for more see: https://stackoverflow.com/a/74160711/5632629



Upvotes: 58

z..
z..

Reputation: 13056

Now that we have LAMBDA and the Lambda Helper Functions, we can solve this problem nicely using REDUCE

=ARRAYFORMULA(
   LET(range,A2:C,
       SPLIT(
         REDUCE(,
           SEQUENCE(COLUMNS(range)),
           LAMBDA(acc,i,
             TOCOL(acc&"ζ"&TOROW(INDEX(range,,i),3)))),
         "ζ")))

enter image description here

Upvotes: 7

kishkin
kishkin

Reputation: 5325

Ok, here's another one from me. No recursion this time, quite fast. Should be made a named function with a single parameter range:

=LET(
    ref,
    BYCOL(range, LAMBDA(c, IFNA(FILTER(c, c <> "")))),
    nums,
    BYCOL(ref, LAMBDA(c, MAX(1, COUNTA(c)))),
    total,
    PRODUCT(nums),
    divs,
    SCAN(total, nums, LAMBDA(acc, cur, acc / cur)),
    ARRAYFORMULA(
        VLOOKUP(
            MOD(INT(SEQUENCE(total, 1,) / divs), nums),
            {
                SEQUENCE(ROWS(ref), 1,),
                ref
            },
            SEQUENCE(1, COLUMNS(ref), 2),)
    )
)

enter image description here

Upvotes: 2

rockinfreakshow
rockinfreakshow

Reputation: 30271

Here's one more approach to add to the collection:

=let(range,A:D,data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))))

enter image description here

No change in formula needed even when the data is in just 3 columns (and non-adjacent too) enter image description here

2 columns with still the same formula enter image description here

An extension to this formula can be used to create unique pair combinations (as described in this question here) while excluding\limiting the duplicates (1,1 | 2,2 | 3,3.... and 1,2 | 2,1 | 1,3 | 3,1....)

=let(range,index(query({A:A,A:A},"Where Col1 is not null")),data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     combo_,if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))),
     Λ,byrow(combo_,lambda(Σ,if(index(Σ,,1)>=index(Σ,,2),,Σ))),
       filter(Λ,index(Λ,,1)<>""))

enter image description here

Upvotes: 3

doubleunary
doubleunary

Reputation: 19195

The question specifies cross join and n-ary Cartesian product. These concepts are different from string concatenation ("car" + "red" → "car red").

Most of the existing answers use text string manipulation and split() which is undesirable because they may cause side effects such as converting the text string 1 2 3 to the date 2 January 2003. Some answers use recursion through a named function that calls itself. That works, but it is often undesirable as well, because named functions need to be recreated or imported in each spreadsheet where they are to be used.

One way to implement n-ary Cartesian product in a plain vanilla Google Sheet formula without those undesirable traits is to use a recursive lambda function. The formula below takes a range of any number of columns and gives all ordered n-tuples of their non-blank values, column-wise:

=let( 
  table, A2:D, 
  blank, iferror(1/0), 
  first_, lambda(array, tocol(choosecols(array, 1), true)), 
  rest_, lambda(n, choosecols(table, sequence(1, columns(table) - n, n + 1))), 
  wrap_, lambda(array, wrapCount, wraprows(tocol(array, true), wrapCount)), 

  cartesian_, lambda(a, b, wrap_( 
    byrow(a, lambda(row, 
      reduce(blank, sequence(rows(b)), lambda(acc, i, 
        { acc, row, chooserows(b, i) } 
      ) ) 
    ) ), 
    columns(a) + columns(b) 
  ) ), 

  iterate_, lambda( 
    self, a, b, if(iserror(b), a, 
      self(self, cartesian_(a, first_(b)), rest_(columns(a) + 1)) 
    ) 
  ), 

  iterate_(iterate_, first_(table), rest_(1)) 
)

The same can also be done in an iterative fashion. The formula below will perform about two times faster than split() based solutions with larger datasets:

=let( 
  table, A2:D, 
  numCols, columns(table), 
  colIndices, sequence(1, numCols), 
  column_, lambda(colIndex, tocol(choosecols(table, colIndex), 1)), 
  numColRows, bycol(colIndices, lambda(i, rows(column_(i)))), 
  numColRows_, lambda(i, index(numColRows, 0, i)), 
  colIndicesToRight_, lambda(i, sequence(1, numCols - i, i + 1)), 
  numColRowsToRight_, lambda(i, bycol(colIndicesToRight_(i), numColRows_)), 
  numCombos_, lambda(i, iferror(product(numColRowsToRight_(i)), 1)), 
  repeatCells_, lambda(colIndex, tocol( 
    map(column_(colIndex), lambda(cell, 
      bycol(sequence(1, numCombos_(colIndex)), lambda(_, cell)) 
    ) )  
  ) ), 
  repeatGroup_, lambda(g, n, tocol(bycol(sequence(1, n), lambda(_, g)), 0, 1)), 
  fillColumn_, lambda(colIndex, let( 
    group, repeatCells_(colIndex), 
    repeatGroup_(group, product(numColRows) / rows(group))
  ) ), 
  if( 
    min(numColRows) = 0, "All columns must have at least one item.", 
    if( 
      numCols * product(numColRows) > 10^6, "Too many results.", 
      bycol(colIndices, fillColumn_) 
    ) 
  ) 
)

The identifiers in the above formulas use a naming convention where a lambda function_ name definition has a trailing underscore.

Both formulas will work with any number of columns and give results like these:

source data
a 1 X
b 2 Y
c
× Cartesian
a 1 X
a 1 Y
a 2 X
a 2 Y
b 1 X
b 1 Y
b 2 X
b 2 Y
c 1 X
c 1 Y
c 2 X
c 2 Y

Upvotes: 0

kishkin
kishkin

Reputation: 5325

Here's another solution.

A named function CARTESIAN_PRODUCT:

=IF(COLUMNS(range) = 1, IFNA(FILTER(range, range <> "")), LAMBDA(sub_product, last_col, REDUCE(, SEQUENCE(ROWS(sub_product)), LAMBDA(acc, cur, LAMBDA(new_range, IF(cur = 1, new_range, {acc; new_range}))({ARRAYFORMULA(IF(SEQUENCE(ROWS(last_col)), INDEX(sub_product, cur,))), last_col}))))(CARTESIAN_PRODUCT(ARRAY_CONSTRAIN(range, ROWS(range), COLUMNS(range) - 1)), LAMBDA(r, IFNA(FILTER(r, r <> "")))(INDEX(range,, COLUMNS(range)))))

(in a little bit more readable formatting)

The only argument is range which specifies the columns with the values. Empty cells will be ignored.

It uses recursion and does not use query smashing or string splitting. It works with any number of columns.

enter image description here

Upvotes: 2

zephyrus
zephyrus

Reputation: 1266

Just to expand on @nicolasZ's comment for those (like myself) not so familiar with sheets syntax:

If you want to create the combinations but split into unique columns you can follow a very similar patter to @player0's answer but wrapping everything in ARRAYFORMULA(SPLIT( ...)," ".

What this does is split the result in a single column by a space (" ") and then distribute this into the next N columns.

Explicitly, to combine 5 columns of data I used:

= ARRAYFORMULA(SPLIT( FLATTEN(FLATTEN(FLATTEN(FLATTEN(
 FILTER(B2:B, B2:B<>"")&" "&TRANSPOSE(
 FILTER(C2:C, C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D, D2:D<>"")))&" "&TRANSPOSE(
 FILTER(E2:E, E2:E<>"")))&" "&TRANSPOSE(
 FILTER(F2:F, F2:F<>"")))," "))

To add more rows you simply have to add another Flatten( to the beginning of the command and then insert &" "&TRANSPOSE( FILTER(F2:F, F2:F<>""))) at the end (but before the " ")).

Note that there must be enough empty column/rows to expand the formula or the command will fail to evaluate.

Upvotes: 1

Max Makhrov
Max Makhrov

Reputation: 18717

Update 201810

Original formula crashes for a big dataset. I described a way to make cross-join with any size of data here.


Try formula:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))
 &" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))

The result:

car red
train red
car yellow
train yellow
car blue
train blue

You may use it again to add another list:

enter image description here

The formula is in cells C2 and E2,

C2 is:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))) )

and E2 is:

=ArrayFormula(transpose(split(rept(concatenate(C2:C&char(9)),counta(D2:D)),char(9)))&" "&transpose(split(concatenate(rept(D2:D&char(9),counta(C2:C))),char(9))) )

Upvotes: 25

Related Questions