Frederic Bastiat
Frederic Bastiat

Reputation: 693

check combinations of columns in a DF to return unique rows

for a, b in itertools.combinations(number_of_notes_cols, 2):
    weekly_meetings_difference = all_meetings_data[(all_meetings_data[a] != all_meetings_data[b]) == True]

The code above used to work: it would return all the rows of all the combinations of pairs of weekly_meetings_difference's columns where the column values (if this was true for any pair of columns). Now, returning weekly_meetings_difference gives me some, but not all, of the rows where the column values changed.


Edit with some code:

Before (when everything seemed to be working fine):

Number of Notes 03112016    Number of Notes 03192016    Number of Notes 03272016    Number of Notes 04042016
Meeting Name                
X      12.0 NaN NaN NaN
Y       5.0 5.0 NaN NaN
Z       2.0 NaN NaN NaN
W       NaN 6.0 713.0 740.0

After (now that I've updated the original dataframe from which I want information):

Number of Notes 03112016    Number of Notes 03192016    Number of Notes 03272016    Number of Notes 04042016    Number of Notes 04122016    Emails 04122016
Meeting Name                        
A   37.0 37.0 38.0 38.0 37.0
X   12.0 NaN NaN NaN NaN NaN
Y   5.0  5.0 NaN NaN NaN NaN
Z   2.0  NaN NaN NaN NaN NaN

Now that I've done this edit, I am noticing row A was added after adding the extra column to the dataframe as well as row W being removed (they both should show each time)

Upvotes: 0

Views: 120

Answers (1)

hume
hume

Reputation: 2553

First, let me make sure that I understand the problem. Are you looking for rows in a dataframe that have more than one unique value? That is, the value changes at least one time in the row.

import pandas as pd
df = pd.DataFrame({'a': [1, 1, 1], 'b': [1, 2, 3], 'c': [1, 1, 3]})

    a  b  c
0|  1  1  1
1|  1  2  1
2|  1  3  3

In the dataframe above, you would want rows 1 and 2. If so, I would do something like:

df.apply(pd.Series.nunique, axis=1)

Which returns the number of unique values in each row of the dataframe:

0    1
1    2
2    2
dtype: int64

Using that result, we can select the rows we care about with:

df[df.apply(pd.Series.nunique, axis=1) > 1]

Which returns the expected:

    a  b  c
1|  1  2  1
2|  1  3  3

Is this what you're after, or is it something else? Happy to edit if you clarify.

Upvotes: 1

Related Questions