siddesh chavan
siddesh chavan

Reputation: 63

how to access set elements by index?

I was working on recommendation system (RS) in python when I came across a serious problem: I couldn't access the set without changing its order.

e.g. Once I changed a set to list the order gets change. (In recommendation, system order is very important.)

final_prediction=set(df_final)-set(df1)

e.g.

>>> df_final=['a','x','z','p','s','j','b']
>>> df1=['b','j']
>>> set(df_final)-set(df1)
{'p', 'a', 's', 'z', 'x'}

Here df_final and df1 both are categorical variables

Although I used other approach, I had to scratch my butt's to change the code because it was giving perfect results using set thing and all other things were just working fine. I was in the final phase of my RS, but because of the set order I had to take other approach.

How do we access an set without changing the order?

Upvotes: 5

Views: 18268

Answers (3)

martineau
martineau

Reputation: 123423

Since you need ordered sets, I recommend using the ActiveState recipe the Python documentation recommends in the "See also:" at the very end.

If you put the recipe's code in a separate file named orderedset.py, it can be imported as a module and used like this:

from orderedset import OrderedSet  # See https://code.activestate.com/recipes/576694

df_final = ['a','x','z','p','s','j','b']
df1 = ['b','j']
print(OrderedSet(df_final) - OrderedSet(df1))  # -> OrderedSet(['a', 'x', 'z', 'p', 's'])

Upvotes: 1

gboffi
gboffi

Reputation: 25023

The lists, the first one is ordered

>>> df_final=['a','x','z','p','s','j','b']
>>> df1=['b','j']

This works but it's O(N×M)

>>> [cat_var for cat_var in df_final if cat_var not in df1]
['a', 'x', 'z', 'p', 's']

This has a setup cost but it's O(N), if both lists are long...

>>> sdf1 = set(df1)
>>> [cat_var for cat_var in df_final if cat_var not in sdf1]
['a', 'x', 'z', 'p', 's']

Upvotes: 0

jpp
jpp

Reputation: 164623

set is an unordered collection. For an ordered collection, you can use list or tuple. You now have a few options. Your choice should depend on whether you expect repeats in df_final. If you have no repeats, you can use a list comprehension:

df1_set = set(df1)
res1 = [i for i in df_final if i not in df1_set]
# ['a', 'x', 'z', 'p', 's']

If you have repeats in df_final, then you need unique items with ordering maintained. For this, you can use toolz.unique, which is equivalent to the unique_everseen recipe found in the docs:

from toolz import unique

res2 = [i for i in unique(df_final) if i not in df1_set]

Upvotes: 5

Related Questions