J-H
J-H

Reputation: 1869

Python Pandas Create Cooccurence Matrix from two rows

I have a Dataframe which looks like this (The columns are filled with ids for a movie and ids for an actor:

    movie  actor  clusterid
0    0      1     2
1    0      2     2
2    1      1     2
3    1      3     2
4    2      2     1

and i want to create a binary co-occurence matrix from this dataframe which looks like this

                  actor1  actor2  actor3
clusterid 2 movie0    1      1     0
            movie1    1      0     1
clusterid 1 movie2    0      1     0

where my dataframe has (i) a multiindex (clusterid, movieid) and a binary count for actors which acted in the movie according to my inital dataframe.

I tried:

df.groupby("movie").agg('count').unstack(fill_value=0)

but unfortunately this doesn't expand the dataframe and counts the totals. Can something like this be done using the internal pandas functions easily?

Thank you for any advice

Upvotes: 1

Views: 165

Answers (1)

akuiper
akuiper

Reputation: 214927

You can create an extra auxiliary column to indicate if the value exists and then do pivot_table:

(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
 .pivot_table('indicator', ['clusterid', 'movie'], 'actor', fill_value = 0))

enter image description here

Or use set_index.unstack() pattern:

(df.assign(actor = "actor" + df.actor.astype(str), indicator = 1)
 .set_index(['clusterid', 'movie', 'actor']).indicator.unstack('actor', fill_value=0))

Upvotes: 1

Related Questions