Jakob
Jakob

Reputation: 21

pandas vectorized lookup without deprecated lookup()

My problem concerns lookup(), which is to be deprecated. So I'm looking for an alternative. Documentation suggests using loc() (which does not seem to work with a vectorized approach) or melt() (which seems quite convoluted). Furthermore, the documentation suggests factorize() which (I think) does not work for my setup.

Here is the problem: I have a 2-column DataFrame with x,y-values.

k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df

And I have several DataFrames in crosstab-format of df. For example one called Cij:

Concordance table (Cij):
x     1     2     3    4     5     6  RTotal
y                                           
1   16     15    13  NaN     5   NaN     108
2   NaN    12   NaN   15   NaN   NaN      87
3   NaN   NaN     6  NaN    13    14     121

I now want to perform a vectorized lookup in Cij from xy-pairs in df to generate a new column CrC in df. Which so far looked like this (plain and simple):

df["Crc"] = Cij.lookup(df["y"],df["x"])

How can I achieve the same thing without lookup()? Or did I just not understand the suggested alternatives?

Thanks in advance!

Addendum: Working code example as requested.

data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])

# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout

#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):           
  for j in range(ct_a.shape[1]):
    if ct_a.iloc[i,j] != 0:
      Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()

#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])

Note: In this case loop-based "filling" of Cij is fine, since crosstabs of df are always small. However, df itself can be very large so vectorized lookup is a necessity.

Upvotes: 2

Views: 291

Answers (3)

sammywemmy
sammywemmy

Reputation: 28709

Using the factorize path in the docs, you can replicate the lookup functionality:

x_index, x_uniques = pd.factorize(df.x)

arrays = (Cij
          .reindex(columns = x_uniques)
          .to_numpy()[df.y.factorize()[0], x_index]
         )

df['r'] = arrays

df
    y  x     r   Crc
0   1  1  16.0  16.0
1   1  1  16.0  16.0
2   1  2  15.0  15.0
3   1  2  15.0  15.0
4   1  2  15.0  15.0
5   1  3  13.0  13.0
6   1  3  13.0  13.0
7   1  5   5.0   5.0
8   2  2    12  12.0
9   2  4    15  15.0
10  2  4    15  15.0
11  2  4    15  15.0
12  2  4    15  15.0
13  2  4    15  15.0
14  3  3   6.0   6.0
15  3  3   6.0   6.0
16  3  5  13.0  13.0
17  3  5  13.0  13.0
18  3  5  13.0  13.0
19  3  6  14.0  14.0
20  3  6  14.0  14.0
21  3  6  14.0  14.0
22  3  6  14.0  14.0
23  3  6  14.0  14.0

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

IIUC, you can stack Cij and then reindex based on a list of tuples created by using zip:

df['Crc'] = Cij.stack().reindex(zip(df['y'], df['x'])).to_numpy()
print(df)

Output:

    y  x   Crc
0   1  1  16.0
1   1  1  16.0
2   1  2  15.0
3   1  2  15.0
4   1  2  15.0
5   1  3  13.0
6   1  3  13.0
7   1  5   5.0
8   2  2    12
9   2  4    15
10  2  4    15
11  2  4    15
12  2  4    15
13  2  4    15
14  3  3   6.0
15  3  3   6.0
16  3  5  13.0
17  3  5  13.0
18  3  5  13.0
19  3  6  14.0
20  3  6  14.0
21  3  6  14.0
22  3  6  14.0
23  3  6  14.0

Upvotes: 1

Rawson
Rawson

Reputation: 2797

If you have checked df["Crc"] = Cij.loc[df["y"], df["x"]], you will notice that it returns an array. By comparing this with df["Crc"] = Cij.lookup(df["y"],df["x"]), you will also notice that the leading diagonal is the same (which makes sense). Therefore, you can add np.diagonal to return what you need:

df["Crc"] = np.diagonal(Cij.loc[df["y"], df["x"]])

Upvotes: 0

Related Questions