Kim
Kim

Reputation: 53

Fastest way to go through a list. If x is in list retrieve the element with x in the list

if i had a list, e.g.

[{'id': '129', 'label': 'UK 9 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105514', False, '0'], 'disabled': 'disabled'},
{'id': '130', 'label': 'UK 9½', 'price': '0', 'oldPrice': '0', 'products': ['105515', True, '0'], 'disabled': ''},
{'id': '131', 'label': 'UK 10 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105516', False, '0'], 'disabled': 'disabled'}]
list = []
result = []

for x in list:
  if x["disabled"] == '':
       result.append(x)

print(result)

to get the elements of the list with without disabled or with true etc.

I wanted to know what the fastest way to do this if i had a very long list.

All answers are appreciated, thanks.

Upvotes: 2

Views: 626

Answers (4)

CypherX
CypherX

Reputation: 7353

Here is a comparison with time estimates. I used list comprehension and pandas DataFrame. Once you load the data in as a dataframe, the dataframe method is faster. But if you consider loading in the data as well, it is obviously the list comprehension which emerges as the winner.

I made list with 1000,000 dictionaries in it, each with a similar structure as yours.

Note:

  • The test was conducted on a google-colab notebook with two CPUs. OS: Ubuntu Linux.
  • Also I modified the code to help check the output with val.get('id') instead of what you need, val.get('disabled').

Make Dummy Data

import numpy as np
import pandas as pd

def make_dict(id=0, disabled=True):
    dis = 'disabled' if disabled else ''
    d = {'id': id, 'label': 'UK 9½', 'price': '0', 'oldPrice': '0', 'products': ['105515', True, '0'], 'disabled': dis}
    return d

def make_list(size=10, seed=0):
    np.random.seed(seed=seed)
    status = (np.random.rand(size)>0.5)
    ids = np.arange(size) + 1
    vals = [make_dict(id = id, disabled = disabled) for id, disabled in zip(ids, status)]
    return vals

vals = make_list(size=1000000, seed=0)
df = pd.DataFrame(vals)

1. Test List Comprehension (Best Option: Fastest)

%%time
ids = [val.get('id') for val in vals if val.get('disabled')=='']

Output:

CPU times: user 178 ms, sys: 0 ns, total: 178 ms
Wall time: 184 ms

2. Test Pandas DataFrame (without data-loading)

Here we do not consider the time needed for loading the data as a dataframe.

%%time
ids = df.loc[df['disabled']=='','id'].tolist()

Output:

CPU times: user 68.4 ms, sys: 6.03 ms, total: 74.4 ms
Wall time: 75.6 ms

3. Test Pandas DataFrame (with data-loading)

Here we do INCLUDE the time needed for loading the data as a dataframe.

%%time
df = pd.DataFrame(vals)
ids = df.loc[df['disabled']=='','id'].tolist()

Output:

CPU times: user 1.2 s, sys: 49.5 ms, total: 1.25 s
Wall time: 1.26 s

Upvotes: 2

oppressionslayer
oppressionslayer

Reputation: 7204

You can put it in a pandas dataframe, not sure of the speed in comparison with the for loop.

import pandas as pd
data = {
  '129': {'label': 'UK 9 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105514', False, '0'], 'disabled': 'disabled'},
  '130': {'label': 'UK 9½', 'price': '0', 'oldPrice': '0', 'products': ['105515', True, '0'], 'disabled': ''},
  '131': {'label': 'UK 10 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105516', False, '0'], 'disabled': 'disabled'}
}

df = pd.DataFrame(data)

df[df.disabled!=''].to_dict(orient='records')

output:

[{'id': '129',
  'label': 'UK 9 (Out of Stock)',
  'price': '0',
  'oldPrice': '0',
  'products': ['105514', False, '0'],
  'disabled': 'disabled'},
 {'id': '131',
  'label': 'UK 10 (Out of Stock)',
  'price': '0',
  'oldPrice': '0',
  'products': ['105516', False, '0'],
  'disabled': 'disabled'}]```

Upvotes: 1

Z4-tier
Z4-tier

Reputation: 7978

a list is inherently an O(n) structure for lookup; if you need to find something, you just have to scan through the list until you find it.

In your case, it seems that the id values may be unique. If that is the case, you can make the access time O(1) by slightly modifying your data structures:

{
  '129': {'label': 'UK 9 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105514', False, '0'], 'disabled': 'disabled'},
  '130': {'label': 'UK 9½', 'price': '0', 'oldPrice': '0', 'products': ['105515', True, '0'], 'disabled': ''},
  '131': {'label': 'UK 10 (Out of Stock)', 'price': '0', 'oldPrice': '0', 'products': ['105516', False, '0'], 'disabled': 'disabled'}
}

This turns the outermost collection into a dictionary, keyed on the ID's of the individual items. For a large number of items this will have much faster access time than using a [] list, provided you access items directly using the key (ex. d['129']). If you scan through it using something like d.keys() you're back to O(n) again.

If it's absolutely a requirement to be able to select members of this dictionary using multiple elements for access, you are probably going to need to build a custom data structure (Data structure to implement a dictionary with multiple indexes?). Whether that is worth it or not really depends on how big big is. If you have a few thousand records, just stick with the list, you will not notice the difference (and using a custom built solution might even be slower). If you have several million, then start to look for ways to optimize.

Upvotes: 2

Themathix
Themathix

Reputation: 56

Even when you go to "assembly level" in this case you have to check every element of the list for the condition you want.

The only possible way you can speed this up is take record of what elements have

"with without disabled or with true etc."

when you create your list, then you would just go straight to these elements and append, but honestly you can't escape from iterating over the list, if you don't do it, internally the computer will.

Upvotes: 0

Related Questions