Reputation: 275

Filter a dictionary of lists

I have a dictionary of the form:

{"level": [1, 2, 3],
 "conf": [-1, 1, 2],
 "text": ["here", "hel", "llo"]}

I want to filter the lists to remove every item at index i where an index in the value "conf" is not >0.

So for the above dict, the output should be this:

{"level": [2, 3],
 "conf": [1, 2],
 "text": ["hel", "llo"]}

As the first value of conf was not > 0.

I have tried something like this:

new_dict = {i: [a for a in j if a >= min_conf] for i, j in my_dict.items()}

But that would work just for one key.

Upvotes: 21

Answers (11)

Raed Ali

Reputation: 587

Try this, simple and easy to understand, especially for beginners:

a_dict = {"level": [1, 2, 3, 4, 5, 8], "conf": [-1, 1, -1, -2], "text": ["-1", "hel", "llo", "ai", 0, 9]}

# iterate backwards over the list keeping the indexes
for index, item in reversed(list(enumerate(a_dict["conf"]))):
    if item <= 0:
        for lists in a_dict.values():
            del lists[index]
print(a_dict)

Output:

{'level': [2, 5, 8], 'conf': [1], 'text': ['hel', 0, 9]}

Upvotes: 3

user17242583

Reputation:

Here's a one-liner:

dct = {k: [x for i, x in enumerate(v) if d['conf'][i] > 0] for k, v in d.items()}

Output:

>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

With sample data:

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]

Upvotes: 11

S.B

Reputation: 16564

try:

from operator import itemgetter


def filter_dictionary(d):
    positive_indices = [i for i, item in enumerate(d['conf']) if item > 0]
    f = itemgetter(*positive_indices)
    return {k: list(f(v)) for k, v in d.items()}


d = {"level": [1, 2, 3], "conf": [-1, 1, 2], "text": ["-1", "hel", "llo"]}
print(filter_dictionary(d))

output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

I tried to first see which indices of 'conf' are positive, then with itemgetter I picked those indices from values inside the dictionary.

More compact version + without temporary list using generator expression instead:

def filter_dictionary(d):
    f = itemgetter(*(i for i, item in enumerate(d['conf']) if item > 0))
    return {k: list(f(v)) for k, v in d.items()}

Upvotes: 15

Christian Weiss

Reputation: 151

I solved it with this:

from typing import Dict, List, Any, Set

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1", "hel", "llo"]}

# First, we create a set that stores the indices which should be kept.
# I chose a set instead of a list because it has a O(1) lookup time.
# We only want to keep the items on indices where the value in d["conf"] is greater than 0
filtered_indexes = {i for i, value in enumerate(d.get('conf', [])) if value > 0}

def filter_dictionary(d: Dict[str, List[Any]], filtered_indexes: Set[int]) -> Dict[str, List[Any]]:
    filtered_dictionary = d.copy()  # We'll return a modified copy of the original dictionary
    for key, list_values in d.items():
        # In the next line the actual filtering for each key/value pair takes place. 
        # The original lists get overwritten with the filtered lists.
        filtered_dictionary[key] = [value for i, value in enumerate(list_values) if i in filtered_indexes]
    return filtered_dictionary

print(filter_dictionary(d, filtered_indexes))

Output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 4

Efe

Reputation: 66

a = {"level":[1,2,3,4], "conf": [-1,1,2,-1],"text": ["-1","hel","llo","test"]}

# inefficient solution
# for k, v in a.items():
#     if k == "conf":
#         start_search = 0
#         to_delete = [] #it will store the index numbers of the conf that you want to delete(conf<0)
#         for element in v:
#             if element < 0:
#                 to_delete.append(v.index(element,start_search))
#                 start_search = v.index(element) + 1

#more efficient and elegant solution
to_delete = [i for i, element in enumerate(a["conf"]) if element < 0]
for position in list(reversed(to_delete)):
    for k, v in a.items():
        v.pop(position)

and the result will be

>>> a
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 3

Adon Bilivit

Reputation: 27404

Lots of good answers. Here's another 2-pass approach:

mydict = {"level": [1, 2, 3], "conf": [-1, 1, 2], 'text': ["-1", "hel", "llo"]}

for i, v in enumerate(mydict['conf']):
    if v <= 0:
        for key in mydict.keys():
            mydict[key][i] = None

for key in mydict.keys():
    mydict[key] = [v for v in mydict[key] if v is not None]

print(mydict)

Output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 3

Yuval.R

Reputation: 1291

I believe this will work: For each list, we will filter the values where conf is negative, and after that we will filter conf itself.

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1","hel","llo"]}
for key in d:
    if key != "conf":
        d[key] = [d[key][i] for i in range(len(d[key])) if d["conf"][i] >= 0]
d["conf"] = [i for i in d["conf"] if i>=0]
print(d)

A simpler solution will be (exactly the same but using list comprehension, so we don't need to do it separately for conf and the rest:

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1","hel","llo"]}

d = {i:[d[i][j] for j in range(len(d[i])) if d["conf"][j] >= 0] for i in d}

Output: {'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 2

Dom

Reputation: 300

The structure of the data you're describing sounds like it might be more naturally modelled as a pandas DataFrame: you are essentially viewing your data as a 2-D grid, and you want to filter out rows of that grid based on the value in one column.

The following snippet will do what you need using a DataFrame as an intermediate representation:

import pandas as pd

data = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
df = pd.DataFrame(data)
df = df.loc[df["conf"] > 0]
result = df.to_dict(orient="list")

Output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

However, note that if you represent your data as a DataFrame in the first place, and keep it in that form when you're done, this is simplified to,

data = pd.DataFrame({
    "level":[1,2,3],
    "conf":[-1,1,2],
    "text":["here","hel","llo"],
})

result = data.loc[data["conf"] > 0]

Output:

   level  conf text
1      2     1  hel
2      3     2  llo

Which is terser, more expressive, and (on large inputs) more performant than any "pure dict" solution.

If the other operations you want to do on this data are similar (in the sense of really being '2D array' operations), it is likely that they will also be more naturally expressed in terms of DataFrames, and so keeping your data as a DataFrame is likely to be advantageous vs converting back to a dict.

Upvotes: 5

user17242583

Reputation:

Here's a numpy way of doing it:

dct = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
dct = {k: np.array(v) for k, v in d.items()}
dct = {k: v[a['conf'] > 0].tolist() for k, v in a.items()}

Output:

>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 3

quamrana

Reputation: 39414

You can have a function which works out which indexes to keep and reformulate each list with only those indexes:

my_dict = {"level":[1,2,3], "conf":[-1,1,2],'text':["-1","hel","llo"]}

def remove_corresponding_items(d, key):
    keep_indexes = [idx for idx, value in enumerate(d[key]) if value>0]
    for key, lst in d.items():
        d[key] = [lst[idx] for idx in keep_indexes]

remove_corresponding_items(my_dict, 'conf')
print(my_dict)

Output as requested

Upvotes: 3

Marco Luzzara

Reputation: 6056

I would keep the indexes of valid elements (those greater than 0) with:

kept_keys = [i for i in range(len(my_dict['conf'])) if my_dict['conf'][i] > 0]

And then you can filter each list checking if the index of a certain element in the list is contained in kept_keys:

{k: list(map(lambda x: x[1], filter(lambda x: x[0] in kept_keys, enumerate(my_dict[k])))) for k in my_dict}

Output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

Upvotes: 10

Filter a dictionary of lists

Answers (11)

Related Questions