Marko Tankosic
Marko Tankosic

Reputation: 181

Pandas - Align matching column values to row

I have what appears to be a simple problem, that I was not able to find a solution to. Namely, I have a table, where first column contains the list of all available applications, while other columns represent users and the list of applications they have:

enter image description here

I'm trying to convert the table into pandas DataFrame and align matching values on the first column. The desired output should look like this:

enter image description here

import pandas as pd

df = pd.read_excel('U:/Desktop/appdata.xlsx')

df.head(10)
Out[21]: 
  Applications User 1 User 2 User 3 User 4 User 5
0         App1   App1   App2   App1   App1   App2
1         App2   App3   App3   App2   App3   App3
2         App3  App10   App4   App7   App4   App4
3         App4    NaN   App5   App8   App5   App5
4         App5    NaN    NaN  App10   App6   App6
5         App6    NaN    NaN    NaN    NaN   App7
6         App7    NaN    NaN    NaN    NaN   App8
7         App8    NaN    NaN    NaN    NaN   App9
8         App9    NaN    NaN    NaN    NaN    NaN
9        App10    NaN    NaN    NaN    NaN    NaN

df[df.apply(lambda x: x['Applications'] == x, axis=1)]
Out[22]: 
  Applications User 1 User 2 User 3 User 4 User 5
0         App1   App1    NaN   App1   App1    NaN
1         App2    NaN    NaN   App2    NaN    NaN
2         App3    NaN    NaN    NaN    NaN    NaN
3         App4    NaN    NaN    NaN    NaN    NaN
4         App5    NaN    NaN    NaN    NaN    NaN
5         App6    NaN    NaN    NaN    NaN    NaN
6         App7    NaN    NaN    NaN    NaN    NaN
7         App8    NaN    NaN    NaN    NaN    NaN
8         App9    NaN    NaN    NaN    NaN    NaN
9        App10    NaN    NaN    NaN    NaN    NaN

Any help is appreciated. Cheers!

Upvotes: 6

Views: 1795

Answers (2)

lmo
lmo

Reputation: 38510

Here is an approach with some numpy tools. Here, apply loops through the columns of interest, np.isin performs a search over your first column (dat.Applications) and returns True if the respective element is contained in the current column. This boolean array is then converted to the respective string in dat.Applications or to NAN if there is no match via np.where. The results are then assigned back to the original DataFrame.

import numpy as np

dat.iloc[:, 1:] = \
dat.iloc[:, 1:].apply(lambda x : np.where(np.isin(dat.Applications, x),
                                          dat.Applications, np.NAN))

Note that it would work to use pd.np.isin for instance rather than directly importing numpy, but this seems a bit cleaner to me.

Upvotes: 2

Mark Wang
Mark Wang

Reputation: 2757

A key observation is that in the output, a value in the column of the application also represents the value for that row (index col == value col)

(df.iloc[:,1:]
   .melt()
   .dropna()
   .pivot(index='value',columns='variable',values='value')
   .reindex([f'App{x}' for x in range(1,11)]))

Upvotes: 0

Related Questions