Chrestomanci
Chrestomanci

Reputation: 221

Transform dataframe from wide to long while breaking apart column names

I have a dataframe that looks like this:

CustomerID    CustomerStatus      CustomerTier     Order.Blue.Good       Order.Green.Bad      Order.Red.Good
----------------------------------------------------------------------------------------------------------------
101           ACTIVE              PREMIUM          NoticeABC: Good 5     NoticeYAF: Bad 1     NoticeAFV: Good 4
102           INACTIVE            DIAMOND                                NoticeTAC: Bad 3

I'm trying to transform it to look like this:

CustomerID     CustomerStatus      CustomerTier        Color      Outcome    NoticeCode     NoticeDesc
----------------------------------------------------------------------------------------------------------------
101            ACTIVE              PREMIUM             Blue       Good       NoticeABC      Good 5
101            ACTIVE              PREMIUM             Green      Bad        NoticeYAF      Bad 1
101            ACTIVE              PREMIUM             Red        Good       NoticeAFV      Good 4
102            INACTIVE            DIAMOND             Green      Bad        NoticeTAC      Bad3

I believe this is just a wide-to-long data transformation, which I tried using this approach:

df = pd.wide_to_long(df, ['Order'], i=['CustomerID','CustomerStatus','CustomerTier'], j='Color', sep='.')

However, this is returning an empty dataframe. I'm sure I'm doing something wrong with the separator--perhaps because there are 2 of them in the column names?

I feel like splitting the column names into Color, Outcome, NoticeCode, and NoticeDesc would be relatively easy once I figure out how to do this conversion, but just struggling with this part!

Any helpful tips to point me in the right direction would be greatly appreciated! Thank you!

Upvotes: 0

Views: 26

Answers (1)

Jake Steele
Jake Steele

Reputation: 113

I believe this would need to be solved with two separate calls to pd.wide_to_long as so:

# To set "Outcome" column
df = pd.wide_to_long(df,
    stubnames = ['Order.Blue', 'Order.Green', 'Order.Red'],
    i = ['CustomerID','CustomerStatus','CustomerTier'],
    j = 'Outcome',
    sep = '.')
df = pd.wide_to_long(df
    stubnames = 'Order',
    i = ['CustomerID', 'CustomerStatus', 'CustomerTier'],
    j = 'Color',
    sep= '.')

Then, to split the Notice column, you could use pd.str.split as so:

df[['NoticeCode', 'NoticeDesc']] = df['Outcome'].str.split(': ', expand=True)

Let me know how this goes and we can workshop a bit!

Upvotes: 1

Related Questions