Cartesian product of successive pairs in numpy

Question

Suppose that each product has different versions that change over time, and I have a data set of time observations with the product id, version id and other data

I am interested in the Cartesian product of the indices of successive versions. i.e. the cartesian products of the indices of version_1 and version_2, version_2 and version_3 and version_3 and version_4.

For example the cartesian product of version_1 and version_2 is: (0,3), (1,3), (2,3), (0,4), (1,4), (2,4), version_2 and version_3 is (3,5), (3,6), (3,7), (4,5), (4,6), (4,7), etc. Ideally I would like two arrays: one of the left indices and one of the right.

Any hints as to how this can be done efficiently using numpy rather than manually looping which is very slow.

Giacomo · Accepted Answer

The best way I found to do is to manually get the version order for each product, looping through the successive versions and then getting the indices of the cartesian product.

def cartesian_product(x: np.ndarray, y: np.ndarray):
    return np.tile(x, len(y)), np.repeat(y, len(x))

unique_product_ids = np.unique(product_ids)
unique_countries = np.unique(countries)

indices_left_list = []
indices_right_list = []

for product_id in unique_product_ids:
    current_product_versions = product_versions[product_ids == product_id]

    _, indexes = np.unique(current_product_versions, return_index=True)
    unique_versions_in_order = [current_product_versions[index] for index in sorted(indexes)]

    for country in unique_countries:
        for version_left, version_right in zip(unique_versions_in_order, unique_versions_in_order[1:]):
            indices_left, indices_right = cartesian_product(
                np.flatnonzero((countries == country) & (product_ids == product_id) & (product_versions == version_left)),
                np.flatnonzero((countries == country) & (product_ids == product_id) & (product_versions == version_right))
            )
            indices_left_list.append(indices_left)
            indices_right_list.append(indices_right)


indices_left = np.concatenate(indices_left_list)
indices_right = np.concatenate(indices_right_list)

Cartesian product of successive pairs in numpy

Answers (2)

Related Questions