Reputation: 123
I have a script that creates a list of dataframes to concatenate. Before concatenation, I am checking a certain column in each dataframe for the presence of a '1' binary flag. If there is not a one, I want to delete the dataframe from the list of dataframes. I am having trouble because I am not sure how to properly index the list to remove the dataframe. I recreated the problem with this code.
data = {'Name':['Tom', 'Tom', 'Tom', 'Tom'], 'Age':[20, 21, 19, 18]}
data2 = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data)
df4 = pd.DataFrame(data2)
dflist = [df, df2, df3, df4]
for frame in dflist:
vals = frame["Name"].values
if 'krish' not in vals:
dflist.remove(frame)
But
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I also tried enumerating the list and deleting based off dflist[i], but that changes the index if something is deleted so subsequently the wrong frames will be removed.
What is the proper way to remove dataframes from a list of df's based on condition? Thank you!
Upvotes: 4
Views: 5699
Reputation: 88276
Instead of removing items from a list while iterating, which is generally a bad practice, use a list comprehension to generate a new list with the dataframes of interest:
[i for i in dflist if 'krish' not in i['Name'].values]
Name Age
0 Tom 20
1 Tom 21
2 Tom 19
3 Tom 18, Name Age
0 Tom 20
1 Tom 21
2 Tom 19
3 Tom 18]
If the dataframes are very large, here's a safe way to remove the unwanted dataframes from the original list:
ix = []
for i, frame in enumerate(dflist):
vals = frame["Name"]
if not vals.isin(['krish']).any():
ix.append(i)
# sort the indices of dataframes to drop
# by starting from higher to lower indices you're guaranteed
# that the indices on the dataframe will remain unmodified while deleting
for i in sorted(ix, reverse=True):
del dflist[i]
Upvotes: 8
Reputation: 323326
You should using del
from index
part rather than using remove
l=[]
for index,frame in enumerate(dflist):
vals = frame["Name"].values
if 'krish' not in vals:
l.append(index)
for x in sorted(l, reverse=True):
del dflist[x]
Upvotes: 2