Reputation: 59

Python operator similar to %in% in R

I'm looking for an operator similar to %in% in R

For example,

x = c("a","b","c"); 
y = c("a","d")

x %in% y # would give me
#TRUE FALSE FALSE

How to achieve this in Python?

Upvotes: 3

Answers (4)

cjrieds

Reputation: 887

Python does not come with an operator that does exactly what you want. One option is to adapt your code to do things the "pythonic" way. Another option is to use operator overloading to create a custom operator for a particular class.

Option 1 is probably the best thing to do: it's more straightforward and your code will be easier to read and modify. Option 2 is (in my opinion) more fun, but probably only makes sense if you're making a domain-specific language or working in a small code base that you totally control.

Koba provides several options. My personal opinion is to just use the list comprehension as opposed to map: Python 3 has map return an iterator as opposed to a list and there's some history with people finding list comprehensions more readable (and Python's benevolent dictator for life, Guido, prefers list comprehensions to maps). Thus, I think this is the best for option 1:

people = ['man', 'woman', 'boy', 'girl']

children = ['boy', 'girl']

output = [p in children for p in people]

Option 2 would work by creating a custom object, perhaps one that extends a list or other iterable. It would override a special method (this is how operators work in Python).

In [1]: class mylist(list):
   ...:     def __mod__(self, other):
   ...:         return [s in other for s in self]
   ...: 

In [2]: people = mylist(['man', 'woman', 'boy', 'girl'])

In [3]: children = ['boy', 'girl']

In [4]: people % children
Out[4]: [False, False, True, True]

Upvotes: 0

Raymond Hettinger

Reputation: 226486

>>> x = ("a", "b", "c")
>>> y = ("a", "d")
>>> map(y.__contains__, x)
[True, False, False]

The contains test can be sped-up if y is stored a set because O(1) hash table lookups avoid unnecessary comparisons:

>>> y = {"a", "d"}

Don't be deceived by the bogus timings from the other respondent. For non-trivial datasets, repeated O(n) searches are a terrible idea. The timings were also mis-interpreted (with only three inputs tested over a two-item search space, the cost of the one-time global variable lookups for map and set tend to dominate the timing). Further, the other respondent ignored the warnings emitted by the timing tool that indicate that his timings are wildly inconsistent (possibly due to cached intermediate results making the timings useless).

I presume that if you're doing R style statistical analysis, your data is bigger than A B C and A D. The other answer is tuned to that toy dataset and doesn't scale to anything you might care about.

In [1]: import random

In [2]: people = [random.choice(['man', 'woman', 'boy', 'girl']) for i in range(1000)]

In [3]: children = ['boy', 'girl']

In [4]: %timeit [p in children for p in people]
10000 loops, best of 3: 65 µs per loop

In [5]: %timeit map(children.__contains__, people)
10000 loops, best of 3: 58.5 µs per loop

In [6]: %timeit map(set(children).__contains__, people)
10000 loops, best of 3: 49.8 µs per loop

As the search space grows larger than just two choices, the difference between O(1) and O(n) search becomes increasingly important:

In [10]: scores = [random.choice(range(10)) for i in range(1000)]

In [11]: evens = [0, 2, 4, 6, 8]

In [12]: %timeit [x in evens for x in scores]
10000 loops, best of 3: 98.2 µs per loop

In [13]: %timeit map(evens.__contains__, scores)
10000 loops, best of 3: 90.5 µs per loop

In [14]: %timeit map(set(evens).__contains__, scores)
10000 loops, best of 3: 57.6 µs per loop

Upvotes: 1

Ram K

Reputation: 1785

I would use the Python Data Analysis Library "pandas" for this kind of stuff that requires operations analogous to R . You can get started here : http://pandas.pydata.org/ . The python equivalent for %in% using pandas would be "isin" ( there are examples here : http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html#match )

Upvotes: 1

user3337714

Reputation: 673

This will give you a boolean array.

numpy.x([1,1,1]) == numpy.y([1,1,1])

Upvotes: 0

Python operator similar to %in% in R

Answers (4)

Related Questions