Reputation: 133
Hi I have a dataframe as follows:
And would like to create a dataframe with 2 columns:
Writer1 Writer2
that lists all the permutations of writers of a song ex: for the song 03 Bonnie & Clyde the writers: Prince, Tupac Shakur, Jay-Z, Tyrone Wrice and Kanye West were involved. My dataframe therefore should look like:
Writer1 Writer2
Prince Tupac Shakur
Prince Jay-Z
Prince Tyrone Wrice
Prince Kanye West
Tupac S Jay-Z
Tupac S Tyrone Wrice
Tupac S Kanye West
Jay-Z Tyrone Wrice
Jay-Z Kanye West
Tyrone Kanye West
Any idea how I can go about it pls?
Upvotes: 2
Views: 1861
Reputation: 1103
Here's one approach using itertools.combinations
:
import itertools
import pandas as pd
def get_combinations(df, song_name):
"""
Get a dataframe of all two-writer combinations for a given song.
:param df: dataframe containing all artists, songs and writers
:param song_name: name of song
:returns: dataframe with cols 'Writer1', 'Writer2' of all two writer combinations for the given song
"""
song_frame = df[df['Song'] == song_name]
combinations_df = pd.DataFrame(list(itertools.combinations(song_frame['Writer'].unique(), 2)),
columns=['Writer1', 'Writer2'])
return combinations_df
combinations_df = get_combinations(df, '03 Bonnie & Clyde')
Note that this assumes your data is in the form of a Pandas dataframe. You can easily read in from a text file or csv, or create one like the following to test:
import numpy as np
df = pd.DataFrame({'Artist':np.repeat('Jay-Z',5).tolist() + ['David Bowie'] * 2 + ['List of the X Factor finalists'] * 2,
'Song':np.repeat('03 Bonnie & Clyde',5).tolist() + ['Heroes'] * 4,
'Writer':['Prince', 'Tupac Shakur',
'Jaz-Z', 'Tyrone Wrice',
'Kanye West'] + ['David Bowie', 'Brian Eno'] * 2})
If you want to efficiently apply this over your whole dataframe, consider:
def combinations_per_group(group):
"""Return combinations of writers after grouping by song."""
group_combs = pd.DataFrame(list(itertools.combinations(group['Writer'].unique(),2)),
columns=['Writer1', 'Writer2'])
combinations_df = df.groupby(['Song']).apply(combinations_per_group)\
.reset_index()\
.drop('level_1', axis=1)
This returns a dataframe with the song as the index and the desired columns giving all combinations of writers per song.
Upvotes: 1