TJE
TJE

Reputation: 580

Delete specified rows by indices from a list of dicts

I have a long list of dicts as my dataset (each row in the list is a dictionary).

There are a few rows in this list that I need to remove (because the data in these rows is inconsistent with the rest of the dataset).

I have already created a function that identifies the index numbers of the rows I would like to remove like so:

indices_to_remove = [10200, 15006, 22833, 33442, 54214]

I would like to have a function that deletes/removes all of the rows in my list if their index matches this list.

Here's what I tried so far:

my_original_dataset = *a list of dicts*

indices_to_remove = [10200, 15006, 22833, 33442, 54214]

def remove_missing_rows(dataset):
    new_list = []
    for row_dict in dataset:
        if row_dict not in indices_to_remove:
            new_list.append(row_dict)
    return new_list

new_dataset_all_empty_removed = remove_missing_rows(my_original_dataset)

I realize that the problem is that row_dict is referring to the actual row and not the index number of the row, but don't know how to reference to row number here.

Upvotes: 0

Views: 83

Answers (3)

f5r5e5d
f5r5e5d

Reputation: 3706

to literally remove from the dataset, dataset.pop(i) works

you have to pop from the end to the start so the indices_to_remove need to be sorted or you have to do it explictly

dataset = [1,2,3,4,5]
indices_to_remove = [1,3]

[dataset.pop(i) for i in indices_to_remove[::-1]]

dataset

Out[195]: [1, 3, 5]

the output of the listcomp can be ignored - all you want is the 'side effect' of removing the rows from the list

as sugested:

for i in indices_to_remove[::-1]:
    dataset.pop(i)

may be 'cleaner'

Upvotes: 0

Moses Koledoye
Moses Koledoye

Reputation: 78546

You can generate the indices alongside the rows themselves with enumerate. Another thing to speed up the look up time of each index is to make the list of indices a set; sets are optimized for membership checks:

indices_to_remove = {10200, 15006, 22833, 33442, 54214}

def remove_missing_rows(dataset):
    new_list = []
    for i, row_dict in enumerate(dataset):
        if i not in indices_to_remove:
            new_list.append(row_dict)
    return new_list

You could also do this flatly using a list comprehension, without having to create a function:

new_list = [x for i, x in enumerate(dataset) if i not in indices_to_remove]

This creates a new list with all items in indices_to_remove dropped.

Upvotes: 3

Davis Raimon
Davis Raimon

Reputation: 711

I think instead of this 'if row_dict not in indices_to_remove:' in 8th line of code This will do the removal 'if dataset.index(row_dict) not in indices_to_remove:'

Upvotes: 0

Related Questions