Reputation: 59
I have the following dataframe and a similar second one which I want to compare. The problem is that I think I confuse datatypes:
df1 = pd.DataFrame(pd.read_csv("csv", delimiter=';', header=None, skiprows=1, names=['1', '2']))
df['1'].str.replace(r'[^\w\s]+', '')
df['1'] = df1['1'].str.replace('\d+', '')
df = df.apply(nltk.word_tokenize)
df = [nltk.word_tokenize(str(1)) for 1in df]
df = df.apply(lambda x: [item.lower() for item in x if item.lower() not in stop_words])
df = set(df)
TypeError: unhashable type: 'list'
Upvotes: 0
Views: 135
Reputation: 6642
On your second to last line you are generating a Series of lists. Then you are converting that series to a set. You can't do that, because the elements of a set need to be hashable, and lists are not (as it says in the TypeError). In contrast to lists, tuples are hashable. Assuming that the rest of your code works (I have no way of checking), try
df = df.apply(lambda x: tuple(item.lower() for item in x if item.lower() not in stop_words))
df = set(df)
Upvotes: 1