Worst SQL Noob
Worst SQL Noob

Reputation: 189

Compare multiple elements in two columns regarding their order in dataframe in Python

I have a dataframe and in this dataframe I have to compare two columns which may contain multiple elements spitted by comma.

     A                   B
U123,U321,U871    U321,U123,U871
U123                 U321
U123,U321         U123,U321,U871

But I cannot simply test if there are equal. Firstly, the elements are spitted by comma, but I don't need to compare comma.

Secondly, I only want to compare the elements in two columns without considering the order. For example, in first row of A column, the elements are: U123 U321 U871 , and in the fist row of B column, the elemetns are: U321 U123 U871; although in my dataframe, they seems have different orders, but they are same, since both cells contain all elements for another.

Can anyone please advise how should I achieves this?

Upvotes: 0

Views: 42

Answers (1)

Shishir Naresh
Shishir Naresh

Reputation: 763

Try the below code :

#First create a list of elements from both columns on the basis, if length of number of elements is same or not...

comp_List =  [ (data1.split(','),data2.split(','))  if len(data1.split(',')) == len(data2.split(','))  else False   for data1,data2 in zip(df1['A'],df1['B']) ]

#Now compare elements between the columns and create another list of boolean values

for i in range(len(comp_List)):
    print(i)
    print(comp_List[i])

    data1 = comp_List[i]
    boollst = True
    print(type(data1))
    if type(data1) is not bool:
        print(len(data1[0]))
        for j in range(len(data1[0])):
            if data1[0][j] in data1[1]:
                pass
            else:
                comp_List[i] = False
        comp_List[i] = boollst

#Now add that list as a column in your data frame...

df1['Compared'] = comp_List

#So your data frame would look like this...
[![Dataframe with compared column][1]][1]


  [1]: https://i.sstatic.net/N3g16.png

Upvotes: 1

Related Questions