Neil
Neil

Reputation: 8247

how to subset list base on array in python

I have following column names in a list:

 vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']

I have one array which consists of array indexes

(array([1, 5, 7], dtype=int64),)

I want to subset the list based on array indexes

Desired output should be

vars = ['balance','pdays','job_admin.']

I have tried something like this in python

for i, a in enumerate(X):
   if i in new_L:
       print i

But,it does not work.

Upvotes: 5

Views: 3185

Answers (5)

Mehrdad Pedramfar
Mehrdad Pedramfar

Reputation: 11073

Simply use a loop to do that:

result=[]
for i in your_array:
   result.append(vars[i])

or one linear

 [vars[i] for i in your_array]

Upvotes: 17

Paul Panzer
Paul Panzer

Reputation: 53029

You can use operator.itemgetter:

>>> import numpy as np
>>> import operator
>>> vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
>>> idx = np.array([1,5,7])
>>> operator.itemgetter(*idx)(vars)
('balance', 'pdays', 'job_admin.'

This is actually the fastest solution posted so far.

>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=1000000)
>>> 
>>> repeat("np.asarray(vars)[idx]", **kwds)
[2.2382465780247003, 2.225632123881951, 2.1969433058984578]
>>> repeat("[vars[i] for i in idx]", **kwds)
[0.9384958958253264, 0.9366465201601386, 0.9373494561295956]
>>> repeat("operator.itemgetter(*idx)(vars)", **kwds)
[0.9045725339092314, 0.9015877249184996, 0.9032398068811744]

Interestingly, it becomes more than twice as fast if we convert idx to a list first, and that's including the cost of conversion:

>>> repeat("operator.itemgetter(*idx.tolist())(vars)", **kwds)
[0.4062491739168763, 0.4086623480543494, 0.4049343201331794]

We can also afford to convert the result to list and still are much faster than all the other solutions:

>>> repeat("list(operator.itemgetter(*idx.tolist())(vars))", **kwds)
[0.561687784967944, 0.5593925788998604, 0.5586365279741585]

Upvotes: 1

FHTMitchell
FHTMitchell

Reputation: 12157

If you're using numpy anyway, use its advanced indexing

import numpy as np
vars = ['age','balance','day','duration','campaign','pdays',
        'previous','job_admin.','job_blue-collar']
indices = (np.array([1, 5, 7]),)

sub_array = np.asarray(vars)[indices]  
# --> array(['balance', 'pdays', 'job_admin.'], dtype='<U15')

or if you want a list

sub_list = np.asarray(vars)[indices].tolist()
# --> ['balance', 'pdays', 'job_admin.']

Upvotes: 4

urban
urban

Reputation: 5682

If I understand correctly, your data are:

vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']

and indexes are:

idx = [1, 5, 7]

Then you can do:

>>> [vars[i] for i in idx]
['balance', 'pdays', 'job_admin.']

Upvotes: 2

Jordi
Jordi

Reputation: 1343

index = [1, 5, 7]
vars = [vars[i] for i in index]

Upvotes: 2

Related Questions