Reputation: 6197
Say I have a pandas column of lists, for example
column1
['a', 'b', 'b', 'd', 'e']
['b', 'e', 'g']
How do I convert this into a python set?
for example
print(pythonSet)
> {'a', 'b', 'd', 'e', 'g'}
I tried doing set(df['column1'])
but that results in an error
Upvotes: 1
Views: 180
Reputation: 402313
Short and sweet:
{*df['column1'].sum()}
# {'a', 'b', 'd', 'e', 'g'}
The idea is to flatten your column of lists into a single iterable first. For python <= 3.5 please use set(...)
instead of the unpacking operator {*...}
.
Better in terms of performance:
from itertools import chain
{*chain.from_iterable(df['column1'])
# {'a', 'b', 'd', 'e', 'g'}
Also good from in terms of performance - a nested list comprehension (but chain
is marginally faster):
{y for x in df['column1'] for y in x}
# {'a', 'b', 'd', 'e', 'g'}
Upvotes: 5
Reputation: 13401
If you have pandas
version 0.25 or more you can do:
print(set(df["column1"].explode()))
Output:
{'a', 'b', 'd', 'e', 'g'}
Upvotes: 1