Reputation: 13896
I have a Google SpreadSheets doc with three columns A, B and C.
I need to populate the Column C with all the possible combinations of the values in Columns A and B. Please take a look a the capture to see what I mean.
I found this to be done in Excel, here, but it doesn't work in google spreadsheets.
The formula should be useful even for more columns (e.g.: four instead of two)
Can I do this?
Upvotes: 39
Views: 77894
Reputation: 1
in new AI world era, we can solve this with:
=INDEX(TOCOL(A2:A3&" "&TOROW(B2:B4)))
to account for future expansion we can do:
=INDEX(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1)))
for 3 columns:
=INDEX(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
&" "&TOROW(C2:C, 1)))
4 columns:
=INDEX(TOCOL(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
&" "&TOROW(C2:C, 1))
&" "&TOROW(D2:D, 1)))
5 columns:
=INDEX(TOCOL(TOCOL(TOCOL(TOCOL(TOCOL(A2:A, 1)&" "&TOROW(B2:B, 1))
&" "&TOROW(C2:C, 1))
&" "&TOROW(D2:D, 1))
&" "&TOROW(E2:E, 1)))
to split it into multiple columns (not that OP asked for this tho):
=INDEX(SPLIT(TOCOL(TOCOL(TOCOL(TOCOL(
TOCOL(A2:A, 1) &"×"&
TOROW(B2:B, 1))&"×"&
TOROW(C2:C, 1))&"×"&
TOROW(D2:D, 1))&"×"&
TOROW(E2:E, 1)),"×"))
in post-pandemic new world we can solve this with:
=INDEX(FLATTEN(A2:A3&" "&TRANSPOSE(B2:B4)))
to account for future expansion we can do:
=INDEX(FLATTEN(FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(FILTER(B2:B; B2:B<>""))))
for 3 columns:
=INDEX(FLATTEN(FLATTEN(
FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
FILTER(C2:C; C2:C<>""))))
4 columns:
=INDEX(FLATTEN(FLATTEN(FLATTEN(
FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
FILTER(C2:C; C2:C<>"")))&" "&TRANSPOSE(
FILTER(D2:D; D2:D<>""))))
Upvotes: 58
Reputation: 13056
Now that we have LAMBDA
and the Lambda Helper Functions, we can solve this problem nicely using REDUCE
=ARRAYFORMULA(
LET(range,A2:C,
SPLIT(
REDUCE(,
SEQUENCE(COLUMNS(range)),
LAMBDA(acc,i,
TOCOL(acc&"ζ"&TOROW(INDEX(range,,i),3)))),
"ζ")))
Upvotes: 7
Reputation: 5325
Ok, here's another one from me. No recursion this time, quite fast. Should be made a named function with a single parameter range
:
=LET(
ref,
BYCOL(range, LAMBDA(c, IFNA(FILTER(c, c <> "")))),
nums,
BYCOL(ref, LAMBDA(c, MAX(1, COUNTA(c)))),
total,
PRODUCT(nums),
divs,
SCAN(total, nums, LAMBDA(acc, cur, acc / cur)),
ARRAYFORMULA(
VLOOKUP(
MOD(INT(SEQUENCE(total, 1,) / divs), nums),
{
SEQUENCE(ROWS(ref), 1,),
ref
},
SEQUENCE(1, COLUMNS(ref), 2),)
)
)
Upvotes: 2
Reputation: 30271
Here's one more approach to add to the collection:
=let(range,A:D,data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))))
No change in formula needed even when the data is in just 3 columns (and non-adjacent too
)
2 columns
with still the same formula
An extension to this formula can be used to create unique pair combinations (as described in this question here) while excluding\limiting the duplicates (1,1 | 2,2 | 3,3.... and 1,2 | 2,1 | 1,3 | 3,1....
)
=let(range,index(query({A:A,A:A},"Where Col1 is not null")),data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
combo_,if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))),
Λ,byrow(combo_,lambda(Σ,if(index(Σ,,1)>=index(Σ,,2),,Σ))),
filter(Λ,index(Λ,,1)<>""))
Upvotes: 3
Reputation: 19195
The question specifies cross join and n-ary Cartesian product. These concepts are different from string concatenation ("car" + "red" → "car red").
Most of the existing answers use text string manipulation and split()
which is undesirable because they may cause side effects such as converting the text string 1 2 3
to the date 2 January 2003
. Some answers use recursion through a named function that calls itself. That works, but it is often undesirable as well, because named functions need to be recreated or imported in each spreadsheet where they are to be used.
One way to implement n-ary Cartesian product in a plain vanilla Google Sheet formula without those undesirable traits is to use a recursive lambda function. The formula below takes a range of any number of columns and gives all ordered n-tuples of their non-blank values, column-wise:
=let(
table, A2:D,
blank, iferror(1/0),
first_, lambda(array, tocol(choosecols(array, 1), true)),
rest_, lambda(n, choosecols(table, sequence(1, columns(table) - n, n + 1))),
wrap_, lambda(array, wrapCount, wraprows(tocol(array, true), wrapCount)),
cartesian_, lambda(a, b, wrap_(
byrow(a, lambda(row,
reduce(blank, sequence(rows(b)), lambda(acc, i,
{ acc, row, chooserows(b, i) }
) )
) ),
columns(a) + columns(b)
) ),
iterate_, lambda(
self, a, b, if(iserror(b), a,
self(self, cartesian_(a, first_(b)), rest_(columns(a) + 1))
)
),
iterate_(iterate_, first_(table), rest_(1))
)
The same can also be done in an iterative fashion. The formula below will perform about two times faster than split()
based solutions with larger datasets:
=let(
table, A2:D,
numCols, columns(table),
colIndices, sequence(1, numCols),
column_, lambda(colIndex, tocol(choosecols(table, colIndex), 1)),
numColRows, bycol(colIndices, lambda(i, rows(column_(i)))),
numColRows_, lambda(i, index(numColRows, 0, i)),
colIndicesToRight_, lambda(i, sequence(1, numCols - i, i + 1)),
numColRowsToRight_, lambda(i, bycol(colIndicesToRight_(i), numColRows_)),
numCombos_, lambda(i, iferror(product(numColRowsToRight_(i)), 1)),
repeatCells_, lambda(colIndex, tocol(
map(column_(colIndex), lambda(cell,
bycol(sequence(1, numCombos_(colIndex)), lambda(_, cell))
) )
) ),
repeatGroup_, lambda(g, n, tocol(bycol(sequence(1, n), lambda(_, g)), 0, 1)),
fillColumn_, lambda(colIndex, let(
group, repeatCells_(colIndex),
repeatGroup_(group, product(numColRows) / rows(group))
) ),
if(
min(numColRows) = 0, "All columns must have at least one item.",
if(
numCols * product(numColRows) > 10^6, "Too many results.",
bycol(colIndices, fillColumn_)
)
)
)
The identifiers in the above formulas use a naming convention where a lambda function_
name definition has a trailing underscore.
Both formulas will work with any number of columns and give results like these:
source data | ||
---|---|---|
a | 1 | X |
b | 2 | Y |
c |
× Cartesian | ||
---|---|---|
a | 1 | X |
a | 1 | Y |
a | 2 | X |
a | 2 | Y |
b | 1 | X |
b | 1 | Y |
b | 2 | X |
b | 2 | Y |
c | 1 | X |
c | 1 | Y |
c | 2 | X |
c | 2 | Y |
Upvotes: 0
Reputation: 5325
Here's another solution.
A named function CARTESIAN_PRODUCT
:
=IF(COLUMNS(range) = 1, IFNA(FILTER(range, range <> "")), LAMBDA(sub_product, last_col, REDUCE(, SEQUENCE(ROWS(sub_product)), LAMBDA(acc, cur, LAMBDA(new_range, IF(cur = 1, new_range, {acc; new_range}))({ARRAYFORMULA(IF(SEQUENCE(ROWS(last_col)), INDEX(sub_product, cur,))), last_col}))))(CARTESIAN_PRODUCT(ARRAY_CONSTRAIN(range, ROWS(range), COLUMNS(range) - 1)), LAMBDA(r, IFNA(FILTER(r, r <> "")))(INDEX(range,, COLUMNS(range)))))
(in a little bit more readable formatting)
The only argument is range
which specifies the columns with the values. Empty cells will be ignored.
It uses recursion and does not use query smashing or string splitting. It works with any number of columns.
Upvotes: 2
Reputation: 1266
Just to expand on @nicolasZ's comment for those (like myself) not so familiar with sheets syntax:
If you want to create the combinations but split into unique columns you can follow a very similar patter to @player0's answer but wrapping everything in ARRAYFORMULA(SPLIT( ...)," "
.
What this does is split the result in a single column by a space (" "
) and then distribute this into the next N
columns.
Explicitly, to combine 5 columns of data I used:
= ARRAYFORMULA(SPLIT( FLATTEN(FLATTEN(FLATTEN(FLATTEN(
FILTER(B2:B, B2:B<>"")&" "&TRANSPOSE(
FILTER(C2:C, C2:C<>"")))&" "&TRANSPOSE(
FILTER(D2:D, D2:D<>"")))&" "&TRANSPOSE(
FILTER(E2:E, E2:E<>"")))&" "&TRANSPOSE(
FILTER(F2:F, F2:F<>"")))," "))
To add more rows you simply have to add another Flatten(
to the beginning of the command and then insert &" "&TRANSPOSE( FILTER(F2:F, F2:F<>"")))
at the end (but before the " "))
.
Note that there must be enough empty column/rows to expand the formula or the command will fail to evaluate.
Upvotes: 1
Reputation: 18717
Update 201810
Original formula crashes for a big dataset. I described a way to make cross-join with any size of data here.
Try formula:
=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))
&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))
The result:
car red
train red
car yellow
train yellow
car blue
train blue
You may use it again to add another list:
The formula is in cells C2
and E2
,
C2
is:
=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))) )
and E2
is:
=ArrayFormula(transpose(split(rept(concatenate(C2:C&char(9)),counta(D2:D)),char(9)))&" "&transpose(split(concatenate(rept(D2:D&char(9),counta(C2:C))),char(9))) )
Upvotes: 25