Reputation: 1539
I have a df that has one column in which the values are lists of values.
My intent is to split this column using some technique from here: Pandas split column of lists into multiple columns
However, for the column names I want to use each unique value from those lists of values.
To retrieve the unique values I have tried three different methods. Each one has failed with a different reason.
Is there a way to get Series.unique() when the values are a list of values?
My three attempts, with associated tracebacks:
1)
unique_vals = splitted_interests.unique()
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
unique_vals = splitted_interests.unique()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 1991, in unique
result = super().unique()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\base.py", line 1405, in unique
result = unique1d(values)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 405, in unique
uniques = table.unique(values)
File "pandas/_libs/hashtable_class_helper.pxi", line 1767, in pandas._libs.hashtable.PyObjectHashTable.unique
File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'
2)
unique_vals = splitted_interests.apply(lambda x: x.unique())
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
unique_vals = splitted_interests.apply(lambda x: x.unique())
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
unique_vals = splitted_interests.apply(lambda x: x.unique())
AttributeError: 'list' object has no attribute 'unique'
3)
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <listcomp>
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
AttributeError: 'str' object has no attribute 'unique'
At run time, the column with lists looks like this:
Upvotes: 2
Views: 3163
Reputation: 31
"To retrieve the unique values I have tried three different methods. Each one has failed with a different reason."
you may wanna try astype('str') to retrieve unique values in a column:
df.<column>.astype('str').unique()
Upvotes: 3
Reputation: 153460
I think you need, pd.Series.unique
Using @jezrael data:
df = pd.DataFrame({'JobRoleInterest':['aa,ss,ss','dd,ff','k,dd,dd,dd', 'j,gg']})
df['JobRoleInterest'].str.split(',', expand=True).stack().unique().tolist()
Output:
['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']
df = pd.DataFrame({'JobRoleInterest':[['aa','ss','ss'],['dd','ff'],['k','dd','dd','dd'],['j','gg']]})
df['JobRoleInterest'].explode().unique().tolist()
Output:
['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']
Upvotes: 2
Reputation: 862511
For same ordering create dictionaries and extract keys
, solution working in python 3.6+:
df = pd.DataFrame({'JobRoleInterest':['aa,ss,ss','dd,ff','k,dd,dd,dd', 'j,gg']})
splitted_interests = df['JobRoleInterest'].str.split(',')
unique_vals = list(dict.fromkeys([y for x in splitted_interests for y in x]).keys())
print (unique_vals)
['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']
Upvotes: 2