SantoshGupta7
SantoshGupta7

Reputation: 6197

Convert pandas column of lists to a python set

Say I have a pandas column of lists, for example

column1
['a', 'b', 'b', 'd', 'e']
['b', 'e', 'g']

How do I convert this into a python set?

for example

print(pythonSet)
> {'a', 'b', 'd', 'e', 'g'}

I tried doing set(df['column1']) but that results in an error

Upvotes: 1

Views: 180

Answers (2)

cs95
cs95

Reputation: 402313

Short and sweet:

{*df['column1'].sum()}
# {'a', 'b', 'd', 'e', 'g'}

The idea is to flatten your column of lists into a single iterable first. For python <= 3.5 please use set(...) instead of the unpacking operator {*...}.


Better in terms of performance:

from itertools import chain
{*chain.from_iterable(df['column1'])
# {'a', 'b', 'd', 'e', 'g'}

Also good from in terms of performance - a nested list comprehension (but chain is marginally faster):

{y for x in df['column1'] for y in x}
# {'a', 'b', 'd', 'e', 'g'}

Upvotes: 5

Sociopath
Sociopath

Reputation: 13401

If you have pandas version 0.25 or more you can do:

print(set(df["column1"].explode()))

Output:

{'a', 'b', 'd', 'e', 'g'}

Upvotes: 1

Related Questions