Reputation: 1869
I have a Dataframe which looks like this (The columns are filled with ids for a movie and ids for an actor:
movie actor clusterid
0 0 1 2
1 0 2 2
2 1 1 2
3 1 3 2
4 2 2 1
and i want to create a binary co-occurence matrix from this dataframe which looks like this
actor1 actor2 actor3
clusterid 2 movie0 1 1 0
movie1 1 0 1
clusterid 1 movie2 0 1 0
where my dataframe has (i) a multiindex (clusterid, movieid) and a binary count for actors which acted in the movie according to my inital dataframe.
I tried:
df.groupby("movie").agg('count').unstack(fill_value=0)
but unfortunately this doesn't expand the dataframe and counts the totals. Can something like this be done using the internal pandas functions easily?
Thank you for any advice
Upvotes: 1
Views: 165
Reputation: 214927
You can create an extra auxiliary column to indicate if the value exists and then do pivot_table
:
(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
.pivot_table('indicator', ['clusterid', 'movie'], 'actor', fill_value = 0))
Or use set_index.unstack()
pattern:
(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
.set_index(['clusterid', 'movie', 'actor']).indicator.unstack('actor', fill_value=0))
Upvotes: 1