Reputation: 160
I want to create a boolean Dataframe from sets,
So there are 4 sets, each containing a collection of names
a = { a collection of names }
b = { another collection of names}
c = { ... }
d = { ... }
And the result should be a Dataframe that looks like this:
Name | a | b | c | d
--------------------------------------
'John' | True | True | False | True
'Mike' | False | True | False | False
.
.
.
I want a way to do this in Python using Pandas and in an efficient manner.
One way to do is to pick each name and see if it's in each set and then add that name to the Dataframe. But there should be faster ways like merging the sets and applying some function.
Upvotes: 0
Views: 44
Reputation: 15240
Here is one possible approach:
a = {'John', 'Mike'}
b = {'Mike', 'Jake'}
pd.DataFrame.from_dict({
'a': dict.fromkeys(a, True),
'b': dict.fromkeys(b, True),
}).fillna(False)
a b
Jake False True
John True False
Mike True True
dict.fromkeys(..., True)
gives you something like
{'John': True, 'Mike': True}
This dictionary is interpreted as a series when passed to DataFrame
. Pandas takes care of aligning the indices, so the final data frame is indexed by the union of all the sets.
Upvotes: 1
Reputation: 7038
I've put together some random sample data that should scale:
a = ['foo', 'bob']
b = ['foo', 'john', 'jeff']
df
name
0 jeff
1 john
2 bob
df['a'] = df.name.isin(a)
df['b'] = df.name.isin(b)
df
name a b
0 jeff False True
1 john False True
2 bob True False
Upvotes: 1