Reputation: 43265
I have an 1D integer array of "factors" which mean different things. Sometimes multiple numbers mean the same thing:
import numpy as np
vec = np.arange(1, 10)
comps = {
'good': (3,),
'bad': (4, 5, 9,),
'ok': (2, 3,)
}
result = {}
for name in comps.keys():
result[name] = np.zeros(len(vec), 'bool')
for i, v in enumerate(vec):
result[name][i] = v in comps[name]
This is the desired output. However as vec
gets large and the number of keys in comps
goes up, this becomes quite slow. Plus, its yucky... In R
there is the %in%
function:
vec = 1:10
comp = list(
good = 3,
bad = c(4:5, 9),
ok = 2:3
)
lapply(comp, function(x) vec %in% x)
Which does the elementwise comparison between every value on the left side to each value in the right and returns the "logical or" result as a boolean vector the same length as vec
.
I can get closer and cleaner using pandas
:
import pandas as pd
DF = pd.DataFrame({'vec': vec})
result = {}
for name in comps.keys():
result[name] = DF.vec.apply(lambda x: x in comps[name])
Similar to this question... but I want the elementwise array rather than a single boolean as my result.
What is the best way to do this in python? (numpy? pandas?)
Upvotes: 3
Views: 131
Reputation: 375675
You can create this using a dictionary comprehension (and the Series isin
method):
pd.DataFrame({k: df.vec.isin(v) for k, v in comps.iteritems()})
Upvotes: 2