Reputation: 104
In order to perform an analysis, I have been provided with a column name which contains specific information about the product, market and distribution.
The structure of the dataset is as follows:
Date Product1|CBA|MKD Product1|CPA|MKD Product1|CBA|IHR Product2|CBA|IHR
2018-11 12 23 0 2
There are a lot of unique column combinations. What I would like to do is to get the following structure:
Date Product Partner Market Quantity
2020-1 Product1 CBA MKD 11
2020-1 Product1 CPA MKD 22
2020-1 Product1 CBA IHR 0
2020-1 Product2 CBA IHR 1
So, I want to create 3 different columns and populate them with pasted values from the column name. The quantity column would obviously contain the value of the old concatenated column (that bit I know how to do), the issue is getting the first 3 columns.
I have tried to do this in pandas by matching strings but I am really stuck. I'd appreciate some help, thank you!
Upvotes: 1
Views: 134
Reputation: 2417
It looks like you could use pandas.melt
df_ = df.melt(id_vars = 'Date', value_name = 'Quantity')
df_[['Product', 'Partner','Market']] = df_.variable.str.split('|',
expand = True)\
.dropna(axis = 1)
df_.pop('variable')
df_
Out[67]:
Date Quantity Product Partner Market
0 2018-11 12 Product1 CBA MKD
1 2018-11 23 Product1 CPA MKD
2 2018-11 0 Product1 CBA IHR
3 2018-11 2 Product2 CBA IHR
Upvotes: 3
Reputation: 6476
Here's another way to do it:
st = df.set_index("Date").stack().reset_index(-1)
res = st["level_1"].str.split("|")
st[["Product","Partner","Market"]] = pd.DataFrame(res.tolist(), index=st.index)
df2 = st.drop("level_1", axis=1).rename({0:"Quantity"}, axis=1)
print(df2)
Quantity Product Partner Market
Date
2018-11 12 Product1 CBA MKD
2018-11 23 Product1 CPA MKD
2018-11 0 Product1 CBA IHR
2018-11 2 Product2 CBA IHR
Upvotes: 2
Reputation: 8033
a = df.melt(id_vars=["Date"],var_name="Product",
value_name="Val").dropna(how='any').sort_values('Date')
a['Partner'] = a['Product'].str.split("|").str[1]
a['Market'] = a['Product'].str.split("|").str[-1]
a['Product']= a['Product'].str.split("|").str[0]
Upvotes: 1