Reputation: 189
I have a dataframe and in this dataframe I have to compare two columns which may contain multiple elements spitted by comma.
A B
U123,U321,U871 U321,U123,U871
U123 U321
U123,U321 U123,U321,U871
But I cannot simply test if there are equal. Firstly, the elements are spitted by comma, but I don't need to compare comma.
Secondly, I only want to compare the elements in two columns without considering the order. For example, in first row of A column, the elements are: U123 U321 U871 , and in the fist row of B column, the elemetns are: U321 U123 U871; although in my dataframe, they seems have different orders, but they are same, since both cells contain all elements for another.
Can anyone please advise how should I achieves this?
Upvotes: 0
Views: 42
Reputation: 763
Try the below code :
#First create a list of elements from both columns on the basis, if length of number of elements is same or not...
comp_List = [ (data1.split(','),data2.split(',')) if len(data1.split(',')) == len(data2.split(',')) else False for data1,data2 in zip(df1['A'],df1['B']) ]
#Now compare elements between the columns and create another list of boolean values
for i in range(len(comp_List)):
print(i)
print(comp_List[i])
data1 = comp_List[i]
boollst = True
print(type(data1))
if type(data1) is not bool:
print(len(data1[0]))
for j in range(len(data1[0])):
if data1[0][j] in data1[1]:
pass
else:
comp_List[i] = False
comp_List[i] = boollst
#Now add that list as a column in your data frame...
df1['Compared'] = comp_List
#So your data frame would look like this...
[![Dataframe with compared column][1]][1]
[1]: https://i.sstatic.net/N3g16.png
Upvotes: 1