Vajjhala
Vajjhala

Reputation: 160

Pandas boolean dataframe creation from sets

I want to create a boolean Dataframe from sets,

So there are 4 sets, each containing a collection of names

a = { a collection of names }
b = { another collection of names}
c = { ... } 
d = { ... }

And the result should be a Dataframe that looks like this:

 Name   |   a   |   b   |  c    |   d 
 --------------------------------------
'John'  | True  | True  | False | True
'Mike'  | False | True  | False | False
   .
   .
   .

I want a way to do this in Python using Pandas and in an efficient manner.

One way to do is to pick each name and see if it's in each set and then add that name to the Dataframe. But there should be faster ways like merging the sets and applying some function.

Upvotes: 0

Views: 44

Answers (2)

Igor Raush
Igor Raush

Reputation: 15240

Here is one possible approach:

a = {'John', 'Mike'}
b = {'Mike', 'Jake'}

pd.DataFrame.from_dict({
    'a': dict.fromkeys(a, True),
    'b': dict.fromkeys(b, True),
}).fillna(False)
          a      b
Jake  False   True
John   True  False
Mike   True   True

dict.fromkeys(..., True) gives you something like

{'John': True, 'Mike': True}

This dictionary is interpreted as a series when passed to DataFrame. Pandas takes care of aligning the indices, so the final data frame is indexed by the union of all the sets.

Upvotes: 1

Andrew L
Andrew L

Reputation: 7038

I've put together some random sample data that should scale:

a = ['foo', 'bob']
b = ['foo', 'john', 'jeff']

df
   name
0  jeff
1  john
2   bob

df['a'] = df.name.isin(a)
df['b'] = df.name.isin(b)

df
   name      a      b
0  jeff  False   True
1  john  False   True
2   bob   True  False

Upvotes: 1

Related Questions