Reputation: 580
I have a long list of dicts as my dataset (each row in the list is a dictionary).
There are a few rows in this list that I need to remove (because the data in these rows is inconsistent with the rest of the dataset).
I have already created a function that identifies the index numbers of the rows I would like to remove like so:
indices_to_remove = [10200, 15006, 22833, 33442, 54214]
I would like to have a function that deletes/removes all of the rows in my list if their index matches this list.
Here's what I tried so far:
my_original_dataset = *a list of dicts*
indices_to_remove = [10200, 15006, 22833, 33442, 54214]
def remove_missing_rows(dataset):
new_list = []
for row_dict in dataset:
if row_dict not in indices_to_remove:
new_list.append(row_dict)
return new_list
new_dataset_all_empty_removed = remove_missing_rows(my_original_dataset)
I realize that the problem is that row_dict is referring to the actual row and not the index number of the row, but don't know how to reference to row number here.
Upvotes: 0
Views: 83
Reputation: 3706
to literally remove from the dataset, dataset.pop(i)
works
you have to pop
from the end to the start so the indices_to_remove
need to be sorted or you have to do it explictly
dataset = [1,2,3,4,5]
indices_to_remove = [1,3]
[dataset.pop(i) for i in indices_to_remove[::-1]]
dataset
Out[195]: [1, 3, 5]
the output of the listcomp can be ignored - all you want is the 'side effect' of removing the rows from the list
as sugested:
for i in indices_to_remove[::-1]:
dataset.pop(i)
may be 'cleaner'
Upvotes: 0
Reputation: 78546
You can generate the indices alongside the rows themselves with enumerate
. Another thing to speed up the look up time of each index is to make the list of indices a set; sets are optimized for membership checks:
indices_to_remove = {10200, 15006, 22833, 33442, 54214}
def remove_missing_rows(dataset):
new_list = []
for i, row_dict in enumerate(dataset):
if i not in indices_to_remove:
new_list.append(row_dict)
return new_list
You could also do this flatly using a list comprehension, without having to create a function:
new_list = [x for i, x in enumerate(dataset) if i not in indices_to_remove]
This creates a new list with all items in indices_to_remove
dropped.
Upvotes: 3
Reputation: 711
I think instead of this 'if row_dict not in indices_to_remove:' in 8th line of code This will do the removal 'if dataset.index(row_dict) not in indices_to_remove:'
Upvotes: 0