Reputation: 384
I have a dataset of customer information which has a unique ID per customer; an identity variable which considers their status such as retired, student, etc.; and finally a type variable which considers what type of shopper they are, e.g. weekly, monthly etc. Customers can have multiple identities and types, but only one ID.
From my dataset, I wish to correlate every identity with every type for the purposes of doing counts and heatmaps on this. As such, I need to break out and unpivot the data into an attribute:value pair format while maintaining the customer ID. I'm scratching my head on how to even approach this so am not sure what documentation to read, nor can provide any coding attempts.
Here is a reproducible example of my data and a sample of my expected output. Any advice is welcomed.
import pandas as pd
import numpy as np
data = {'ID':['1','1','1','2','2','2','3','3','3','3'],
'Identity': ['Identity_1', 'Identity_2','Identity_3','Identity_1', 'Identity_4','Identity_2','Identity_4', 'Identity_5','Identity_6',np.nan],
'Type': ['Type_1', 'Type_2',np.nan,'Type_3', 'Type_1',np.nan,'Type_4','Type_5','Type_1', 'Type_6']
}
df_data = pd.DataFrame (data, columns = ['ID','Identity','Type'])
result ={'ID':['1','1','1','1','1','1'],
'Identity': ['Identity_1', 'Identity_1','Identity_2','Identity_2', 'Identity_3','Identity_3'],
'Type': ['Type_1', 'Type_2','Type_1', 'Type_2','Type_1', 'Type_2']
}
df_result = pd.DataFrame (result, columns = ['ID','Identity','Type'])
Upvotes: 1
Views: 95
Reputation: 29742
One way using itertools.product
:
from itertools import product
def prod(data):
return pd.DataFrame(list(product(data["Identity"].dropna(),
data["Type"].dropna())),
columns=["Identity", "Type"])
new_df = df_data.groupby("ID").apply(prod).reset_index(level=1, drop=True)
print(new_df)
Output:
Identity Type
ID
1 Identity_1 Type_1
1 Identity_1 Type_2
1 Identity_2 Type_1
1 Identity_2 Type_2
1 Identity_3 Type_1
1 Identity_3 Type_2
2 Identity_1 Type_3
2 Identity_1 Type_1
2 Identity_4 Type_3
2 Identity_4 Type_1
2 Identity_2 Type_3
2 Identity_2 Type_1
3 Identity_4 Type_4
3 Identity_4 Type_5
3 Identity_4 Type_1
3 Identity_4 Type_6
3 Identity_5 Type_4
3 Identity_5 Type_5
3 Identity_5 Type_1
3 Identity_5 Type_6
3 Identity_6 Type_4
3 Identity_6 Type_5
3 Identity_6 Type_1
3 Identity_6 Type_6
Upvotes: 1