Justin
Justin

Reputation: 43265

The python version of R's %in% function

I have an 1D integer array of "factors" which mean different things. Sometimes multiple numbers mean the same thing:

import numpy as np

vec  = np.arange(1, 10)
comps = {
  'good': (3,),
  'bad': (4, 5, 9,),
  'ok': (2, 3,)
}

result = {}
for name in comps.keys():
    result[name] = np.zeros(len(vec), 'bool')
    for i, v in enumerate(vec):
        result[name][i] = v in comps[name]

This is the desired output. However as vec gets large and the number of keys in comps goes up, this becomes quite slow. Plus, its yucky... In R there is the %in% function:

vec = 1:10
comp = list(
    good = 3,
    bad = c(4:5, 9),
    ok = 2:3
)

lapply(comp, function(x) vec %in% x)

Which does the elementwise comparison between every value on the left side to each value in the right and returns the "logical or" result as a boolean vector the same length as vec.

I can get closer and cleaner using pandas:

import pandas as pd

DF = pd.DataFrame({'vec': vec})

result = {}
for name in comps.keys():
    result[name] = DF.vec.apply(lambda x: x in comps[name])

Similar to this question... but I want the elementwise array rather than a single boolean as my result.

What is the best way to do this in python? (numpy? pandas?)

Upvotes: 3

Views: 131

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375675

You can create this using a dictionary comprehension (and the Series isin method):

pd.DataFrame({k: df.vec.isin(v) for k, v in comps.iteritems()})

Upvotes: 2

Related Questions