Reputation: 8247
I have following column names in a list:
vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
I have one array which consists of array indexes
(array([1, 5, 7], dtype=int64),)
I want to subset the list based on array indexes
Desired output should be
vars = ['balance','pdays','job_admin.']
I have tried something like this in python
for i, a in enumerate(X):
if i in new_L:
print i
But,it does not work.
Upvotes: 5
Views: 3185
Reputation: 11073
Simply use a loop to do that:
result=[]
for i in your_array:
result.append(vars[i])
or one linear
[vars[i] for i in your_array]
Upvotes: 17
Reputation: 53029
You can use operator.itemgetter
:
>>> import numpy as np
>>> import operator
>>> vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
>>> idx = np.array([1,5,7])
>>> operator.itemgetter(*idx)(vars)
('balance', 'pdays', 'job_admin.'
This is actually the fastest solution posted so far.
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=1000000)
>>>
>>> repeat("np.asarray(vars)[idx]", **kwds)
[2.2382465780247003, 2.225632123881951, 2.1969433058984578]
>>> repeat("[vars[i] for i in idx]", **kwds)
[0.9384958958253264, 0.9366465201601386, 0.9373494561295956]
>>> repeat("operator.itemgetter(*idx)(vars)", **kwds)
[0.9045725339092314, 0.9015877249184996, 0.9032398068811744]
Interestingly, it becomes more than twice as fast if we convert idx to a list first, and that's including the cost of conversion:
>>> repeat("operator.itemgetter(*idx.tolist())(vars)", **kwds)
[0.4062491739168763, 0.4086623480543494, 0.4049343201331794]
We can also afford to convert the result to list and still are much faster than all the other solutions:
>>> repeat("list(operator.itemgetter(*idx.tolist())(vars))", **kwds)
[0.561687784967944, 0.5593925788998604, 0.5586365279741585]
Upvotes: 1
Reputation: 12157
If you're using numpy
anyway, use its advanced indexing
import numpy as np
vars = ['age','balance','day','duration','campaign','pdays',
'previous','job_admin.','job_blue-collar']
indices = (np.array([1, 5, 7]),)
sub_array = np.asarray(vars)[indices]
# --> array(['balance', 'pdays', 'job_admin.'], dtype='<U15')
or if you want a list
sub_list = np.asarray(vars)[indices].tolist()
# --> ['balance', 'pdays', 'job_admin.']
Upvotes: 4
Reputation: 5682
If I understand correctly, your data are:
vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
and indexes are:
idx = [1, 5, 7]
Then you can do:
>>> [vars[i] for i in idx]
['balance', 'pdays', 'job_admin.']
Upvotes: 2