Reputation: 1485
I have a python list called added
that contains 156 individual lists containing two cols references and an array. An example is as follows:
[0, 1, array]
The problem is I have duplicates, although they are not exact as the column references will be flipped. The following two will be exactly the same:
[[0, 1, array], [1, 0, array]]
The way I have tried removing duplicates was to sort the numbers and check if any were the same and if so then append the result to a new list.
Both resulted in separate errors:
for a in range(len(added)):
added[a][0:2] = added[a][0:2].sort()
TypeError: can only assign an iterable
I also tried to see if the array was in my empty python list no_dups
and if it wasnt then append the column refernces and array.:
no_dups = []
for a in range(len(added)):
if added[a][2] in no_dups:
print('already appended')
else:
no_dups.append(added[a])
<input>:2: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
Neither worked. I'm struggling to get my head round how to remove duplicates here.
Thanks.
EDIT: reproducible code:
import numpy as np
import pandas as pd
from sklearn import datasets
data = datasets.load_boston()
df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
cols = []
added = []
for column in X.T:
cols.append(column)
for i in range(len(cols)):
for x in range(len(cols)):
same_check = cols[i] == cols[x]
if same_check.all() == True:
continue
else:
added.append([i, x, cols[i] * cols[x]])
This code should give you access to the entire created added
list.
Upvotes: 0
Views: 163
Reputation: 12397
TypeError: can only assign an iterable
:added[a][0:2].sort()
returns None
and hence, you cannot assign it to a list. If you want to have the list, you need to use the method sorted()
that actually returns the sorted list:
added[a][0:2] = sorted(added[a][0:2])
<input>:2: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
:This is a warning and not an error. Nonetheless, this will not work for you because as warning states, your object array does not have a well defined =
for it. So when you search if added[a][2] in no_dups
, it cannot really compare added[a][2]
to elements of no_dups
, since equality is not suitably defined. If it is numpy array, you can use:
for a in range(len(added)):
added[a][0:2] = sorted(added[a][0:2])
no_dups = []
for a in added:
add_flag = True
for b in no_dups:
#to compare lists, compare first two elements using lists and compare array using .all()
if (a[0:2]==b[0:2]) and ((a[2]==b[2]).all()):
print('already appended')
add_flag = False
break
if add_flag:
no_dups.append(a)
len(no_dups): 78
len(added): 156
However, if all your arrays are of same length, you should use numpy stacking which is significantly faster.
Upvotes: 0
Reputation: 1992
Your first error is because list.sort()
sorts in place so it does not return and therefore cannot be assigned. A workaround:
for a in range(len(added)):
added[a][:2] = sorted(added[a][:2])
You can then get unique indices as:
unique, idx = np.unique([a[:2] for a in added], axis=0, return_index=True)
no_dups = [added[i] for i in idx]
len(added)
>>> 156
len(no_dups)
>>> 78
Upvotes: 1
Reputation: 4146
You can convert the entire added into a numpy array, then slice the indices and sort them, and then use np.unique to get unique rows.
#dummy added in the form [[a,b,array],[a,b,array],...]
added = [np.random.choice(5,2).tolist()+[np.random.randint(10, size=(1,5))] for i in range(156)]
# Convert to numpy
added_np = np.array(added)
vals, idxs = np.unique(np.sort(added_np[:,:2], axis = 1).astype('int'), axis=0, return_index= True)
added_no_duplicate = added_np[idxs].tolist()
Upvotes: 0