Reputation: 25
I'm trying to use an "if" statement inside a "for" cycle to check if the index of the current item in the cycle (index of a pandas Series containing the item), corresponds to one of the indexes of another Series, but doing so raises a ValueError. This is the line of code which gives problems:
if(ICM_items[ICM_items['track_id'] == i].index[0] in ICM_tgt_items.index.values.flatten().tolist()):
I tried changing both sides of the "in" statement with random integers or lists and it works, also the two items are built correctly, but when coupled in the statement they raise an error.
Hope someone can give me some hints on where's the problem or an alternative way to perform the same task.
ICM_items and ICM_tgt_items are both pandas.Series
Below there's the console error:
Traceback (most recent call last):
File "/Users/LucaButera/git/rschallenge/similarity_to_recommandable_builder.py", line 27, in <module>
dot[ICM_tgt_items[ICM_items[ICM_items['track_id'] == i].index[0]]] = 0
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:4194)
File "pandas/index.pyx", line 280, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:6150)
File "pandas/src/hashtable_class_helper.pxi", line 446, in pandas.hashtable.Int64HashTable.map_locations (pandas/hashtable.c:9261)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
[Finished in 1.26s]
Upvotes: 0
Views: 14244
Reputation: 21264
I would recommend you simplify your expressions, use .loc
, and keep an eye out for edge cases (such as track_id
turning up empty for a given i
).
With the right test data, these steps should help you to narrow down your bug hunt.
Example ICM_items
data:
import numpy as np
import pandas as pd
N = 7
max_track_id = 5
idx1 = ['A','B','C']
icm_idx = np.random.choice(idx1, size=N)
icm = {"track_id":np.random.randint(0, max_track_id, size=N)}
ICM_items = pd.DataFrame(icm, index=icm_idx)
ICM_items
track_id
C 1
A 1
A 2
C 1
B 0
B 0
B 2
Example ICM_tgt_items
data:
idx2 = ['A','B']
icm_tgt_idx = np.random.choice(idx2, size=N)
icm = np.random.random(size=N)
ICM_tgt_items = pd.DataFrame(icm, index=icm_tgt_idx)
0
B 0.785614
A 0.976523
A 0.856821
B 0.098086
B 0.481140
A 0.686156
A 0.851714
Now simply the comparison and catch potential edge cases:
for i in range(max_track_id):
mask = ICM_items['track_id'] == i
try:
# use .loc for indexing, no need to flatten() or use .values on the right.
if ICM_items.loc[mask].index[0] in ICM_tgt_items.index:
print("found")
else:
print("not found")
# catch error if i not found in track_id
except IndexError as e:
print(f"ERROR at i={i}: {e}")
Output:
found
not found
found
ERROR at i=3: index 0 is out of bounds for axis 0 with size 0
ERROR at i=4: index 0 is out of bounds for axis 0 with size 0
Upvotes: 2