Sayantan
Sayantan

Reputation: 101

Comparing column values of Python Pandas Dataframe

How to compare a particular column value with rest of the same column values within the same dataframe?

e.g- let a dataframe is df.

df= A  B
    1  1
    2  0
    1  0
    1  1
    2  0

So we have to first take column A, then pick one by one value and compare rest of the A value. Like, I take 1 and compare with rest of the value like [2,1,1,2] and I found 3rd and 4th value is same. So the result should give me for 1 is =

A
false
true
true
false

Now pick 2 as it is second element. Output of it will be

A
false
false
false
true

basically compare each element with all other elements

This same process will go for column B,C,D....

Would anyone give me any solution how to do it?

Upvotes: 2

Views: 2003

Answers (2)

jezrael
jezrael

Reputation: 863701

You can use list comprehension with compare all values without actual, which is removed by drop:

df1 = pd.concat([df.drop(i) == x for i, x in enumerate(df.values)], keys=df.index)
print (df1)
         A      B
0 1  False  False
  2   True  False
  3   True   True
  4  False  False
1 0  False  False
  2  False   True
  3  False  False
  4   True   True
2 0   True  False
  1  False   True
  3   True  False
  4  False   True
3 0   True   True
  1  False  False
  2   True  False
  4  False  False
4 0  False  False
  1   True   True
  2  False   True
  3  False  False

Detail:

In list comprehesnion create list of DataFrames:

print ([df.drop(i) == x for i, x in enumerate(df.values)])
[       A      B
1  False  False
2   True  False
3   True   True
4  False  False,        A      B
0  False  False
2  False   True
3  False  False
4   True   True,        A      B
0   True  False
1  False   True
3   True  False
4  False   True,        A      B
0   True   True
1  False  False
2   True  False
4  False  False,        A      B
0  False  False
1   True   True
2  False   True
3  False  False]

which are joined together by concat and parameter keys for MultiIndex if necessary, then is possible select each small DataFrame by loc:

print (df1.loc[0])
       A      B
1  False  False
2   True  False
3   True   True
4  False  False

Upvotes: 2

Charles R
Charles R

Reputation: 1661

df_final = pd.DataFrame()

# Iterate all columns
for column in df.columns.tolist():
    # For the iterated column, iterate the line
    for line in range(len(df[column])):

        info = "column: " + str(column) + " - line: " + str(line)
        # Check if the cells below are equals to the iterated cell
        answer = df.loc[df.index > line,column] == df.loc[df.index == line,column].values[0]

        # Display the result
        print(info)
        print(answer)

        # Add the result in a dataframe
        for line in range(len(answer)):
            df_final = df_final.append([[
                info,
                answer.index[line],
                answer.values[line]
            ]])

# Display the resulting dataframe
df_final.columns = ["position", "index", "check"]
print(df_final)

Upvotes: 1

Related Questions