Sockness_Rogers
Sockness_Rogers

Reputation: 1683

Python adding values from multiple columns to a set()

I am having trouble combining multiple values from 10 columns into one set. I wanted to use a set because each column has repeated values and I am looking to get a list of all of the values (medical codes) without repeating any of them in the list. I was able to make an initial set out of the first column but when I try to add other columns I get an "unhashable type error".

Here is my code:

data_sorted = data.fillna(0).sort_values(['PAT_ID', 'VISIT_NO'])
set_ICD1 = set(data_sorted['ICD_1'].unique())
print(len(set_ICD1))
set_ICD = set_ICD1.add(data_sorted['ICD_2'])

print(len(set_ICD))

here is the error I get with this:

11586 # (not part of the error this is the length of the initial set)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-e3966ec54661> in <module>()
  1 set_ICD1 = set(data_sorted['ICD_1'].unique())
  2 print(len(set_ICD1))
----> 3 set_ICD = set_ICD1.add(data_sorted['ICD_2'].unique())
  4 
  5 print(len(set_ICD))

TypeError: unhashable type: 'numpy.ndarray'

Any advice or tips how to fix this would be greatly appreciated!

Upvotes: 1

Views: 1318

Answers (1)

MSeifert
MSeifert

Reputation: 152607

If you want to add multiple elements to a set at once you need to use the update method instead of add:

set_ICD1.update(data_sorted['ICD_2'])

In case it's a NumPy array you should probably use ravel() (in case it's n-dimensional - this will flatten it) and tolist() (for performance):

set_ICD1.update(data_sorted['ICD_2'].ravel().tolist())

Upvotes: 3

Related Questions