Rajeev
Rajeev

Reputation: 46909

python get difference from arrays

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here

tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]

Current Output:

 ['id123', 'id124', 'id125', 'id126', 'id789', 'id666']

Expected ouptut:

 ["id789","id666"]

Upvotes: 1

Views: 156

Answers (3)

shiva
shiva

Reputation: 2770

    >>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
    >>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
    >>> c = (s.split('-')[0] for s in b)
    >>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]

        ['id789', 'id666']
    >>>  

Upvotes: 0

georg
georg

Reputation: 214949

Try sets:

invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 

set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)

Upvotes: 3

Rodrigo Queiro
Rodrigo Queiro

Reputation: 1336

Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.

invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff

output:

['id789-123', 'id666']

http://ideone.com/Q9JBw

Upvotes: 4

Related Questions