Reputation: 101
How to compare a particular column value with rest of the same column values within the same dataframe?
e.g- let a dataframe is df.
df= A B
1 1
2 0
1 0
1 1
2 0
So we have to first take column A, then pick one by one value and compare rest of the A value. Like, I take 1 and compare with rest of the value like [2,1,1,2] and I found 3rd and 4th value is same. So the result should give me for 1 is =
A
false
true
true
false
Now pick 2 as it is second element. Output of it will be
A
false
false
false
true
basically compare each element with all other elements
This same process will go for column B,C,D....
Would anyone give me any solution how to do it?
Upvotes: 2
Views: 2003
Reputation: 863701
You can use list comprehension with compare all values without actual, which is removed by drop
:
df1 = pd.concat([df.drop(i) == x for i, x in enumerate(df.values)], keys=df.index)
print (df1)
A B
0 1 False False
2 True False
3 True True
4 False False
1 0 False False
2 False True
3 False False
4 True True
2 0 True False
1 False True
3 True False
4 False True
3 0 True True
1 False False
2 True False
4 False False
4 0 False False
1 True True
2 False True
3 False False
Detail:
In list comprehesnion create list of DataFrames:
print ([df.drop(i) == x for i, x in enumerate(df.values)])
[ A B
1 False False
2 True False
3 True True
4 False False, A B
0 False False
2 False True
3 False False
4 True True, A B
0 True False
1 False True
3 True False
4 False True, A B
0 True True
1 False False
2 True False
4 False False, A B
0 False False
1 True True
2 False True
3 False False]
which are joined together by concat
and parameter keys
for MultiIndex
if necessary, then is possible select each small DataFrame by loc
:
print (df1.loc[0])
A B
1 False False
2 True False
3 True True
4 False False
Upvotes: 2
Reputation: 1661
df_final = pd.DataFrame()
# Iterate all columns
for column in df.columns.tolist():
# For the iterated column, iterate the line
for line in range(len(df[column])):
info = "column: " + str(column) + " - line: " + str(line)
# Check if the cells below are equals to the iterated cell
answer = df.loc[df.index > line,column] == df.loc[df.index == line,column].values[0]
# Display the result
print(info)
print(answer)
# Add the result in a dataframe
for line in range(len(answer)):
df_final = df_final.append([[
info,
answer.index[line],
answer.values[line]
]])
# Display the resulting dataframe
df_final.columns = ["position", "index", "check"]
print(df_final)
Upvotes: 1