Angelika
Angelika

Reputation: 47

Confusion with series, list and unique elements

I would like to ask some help cause I can't understand a TypeError in a python program. This piece of code:

users2 = np.random.choice(users,5000).tolist()
print len(users2)
print users2[0:20]

for user in users2:
   tags.append(user_counters["tags"].loc[user])

print type(tags)
print set(tags)

The type of tags is list. But when I apply set() method to take the unique elements of "tags" list, the following error appears:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Ok, I understand what it means but I can't understand what thing is type of "Series".

On the other hand, if use :

print tags.unique()

another error makes its appearance:

AttributeError: 'list' object has no attribute 'unique'

Note: users_counters is type of dataframe and users type of list with its elements from users_counters.

So why does TypeError mistake happen since tag is list and set() is for lists?

Thank you in adnvance

Upvotes: 1

Views: 1175

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

Your tags is a list of pandas.Series objects. When you build your list from loc-based selection from the data-frame:

for user in users2:
   tags.append(user_counters["tags"].loc[user])

You'll get a Series. Then you try to make a set out of a list of series, but you can't because series aren't hashable.

So why does TypeError mistake happen since tag is list and set() is for lists?

Huh? set accepts any iterable, and the elements of that iterable are used to construct the resulting set. Your iterable is a list, and the elements are pandas.Series objects. That is the problem.

I suspect you have a data-frame indexed by a series of strings representing users...

>>> df = pd.DataFrame({'tag':[1,2,3, 4], 'c':[1.4,3.9, 2.8, 6.9]}, index=['ted','sara','anne', 'ted'])
>>> df
        c  tag
ted   1.4    1
sara  3.9    2
anne  2.8    3
ted   6.9    4
>>>

When you do your selection, since your user-index has non-unique data elements, when you do the following selection, you'll get a Series:

>>> df['tag'].loc['ted']
user
ted    1
ted    4
Name: a, dtype: int64
>>> type(df['a'].loc['ted'])
<class 'pandas.core.series.Series'>

Upvotes: 2

Related Questions