Giacomo
Giacomo

Reputation: 412

Cartesian product of successive pairs in numpy

Suppose that each product has different versions that change over time, and I have a data set of time observations with the product id, version id and other data

enter image description here

I am interested in the Cartesian product of the indices of successive versions. i.e. the cartesian products of the indices of version_1 and version_2, version_2 and version_3 and version_3 and version_4.

For example the cartesian product of version_1 and version_2 is: (0,3), (1,3), (2,3), (0,4), (1,4), (2,4), version_2 and version_3 is (3,5), (3,6), (3,7), (4,5), (4,6), (4,7), etc. Ideally I would like two arrays: one of the left indices and one of the right.

Any hints as to how this can be done efficiently using numpy rather than manually looping which is very slow.

Upvotes: 1

Views: 151

Answers (2)

Giacomo
Giacomo

Reputation: 412

The best way I found to do is to manually get the version order for each product, looping through the successive versions and then getting the indices of the cartesian product.

def cartesian_product(x: np.ndarray, y: np.ndarray):
    return np.tile(x, len(y)), np.repeat(y, len(x))

unique_product_ids = np.unique(product_ids)
unique_countries = np.unique(countries)

indices_left_list = []
indices_right_list = []

for product_id in unique_product_ids:
    current_product_versions = product_versions[product_ids == product_id]

    _, indexes = np.unique(current_product_versions, return_index=True)
    unique_versions_in_order = [current_product_versions[index] for index in sorted(indexes)]

    for country in unique_countries:
        for version_left, version_right in zip(unique_versions_in_order, unique_versions_in_order[1:]):
            indices_left, indices_right = cartesian_product(
                np.flatnonzero((countries == country) & (product_ids == product_id) & (product_versions == version_left)),
                np.flatnonzero((countries == country) & (product_ids == product_id) & (product_versions == version_right))
            )
            indices_left_list.append(indices_left)
            indices_right_list.append(indices_right)


indices_left = np.concatenate(indices_left_list)
indices_right = np.concatenate(indices_right_list)

Upvotes: 0

David M.
David M.

Reputation: 4588

You can try this:

import pandas as pd
import itertools

df = pd.DataFrame({'version': ['version_1', 'version_1', 'version_1', 'version_2', 'version_2', 'version_3', 'version_3', 'version_3', 'version_4']})

df.version = df.version.apply(lambda x: x[-1])
df = df.reset_index().groupby('version')['index'].apply(list).rename('versions').reset_index()
df['versions_shift'] = df['versions'].shift(-1, fill_value=[[]])
df['cartesian'] = df.apply(lambda x: itertools.product(x['versions'], x['versions_shift']), axis=1)
df['cartesian'] = df['cartesian'].apply(lambda x: list(zip(*x)))
df.drop(['version', 'versions', 'versions_shift'], axis=1, inplace=True)

print(df)

Ouput:

                                  cartesian
0  [(0, 0, 1, 1, 2, 2), (3, 4, 3, 4, 3, 4)]
1  [(3, 3, 3, 4, 4, 4), (5, 6, 7, 5, 6, 7)]
2                    [(5, 6, 7), (8, 8, 8)]
3                                        []

Upvotes: 1

Related Questions