Octavarium
Octavarium

Reputation: 131

Get beginning end end date from one column to two columns for each seperate ID

I've following df

ID      Status        Date
1       A             01-09-2020
1       B             03-09-2020
2       A             10-12-2020
2       B             -

And would like to convert to this:

ID     Status1     Status2      Date1       Date2
1      A           B            01-09-2020  03-09-2020
2      A           B            10-12-2020   -

I've think pivot doesn't apply here since I'm not really aggregating something and I've managed somewhat by using a group_by function where I get the min and max date for each ID, and afterwards joining it, but that seems somehow very devious and doesn't give me the status columns for which I cant use the min or max function since this is a numeric value.

I've tried following solution (answer 10 as someone suggested) How to pivot a dataframe?, which would look like this

df.insert(0, 'count', df.groupby('ID').cumcount())
pivot = df.pivot(index='count', columns='ID', values='STATUS')

but this resulted in following df:

1      2    
A      A
B      B

I've also tried How to do a transpose a dataframe group by key on pandas? but this gives me the error

Index contains duplicate entries, cannot reshape

Also if I use pd.pivot_table() instead of df.pivot as someone else suggested in another post.

Upvotes: 1

Views: 50

Answers (1)

Alonso Ogueda Oliva
Alonso Ogueda Oliva

Reputation: 191

In my opinion you must create two aux dataframes before, the first one grouping by ID and getting the first and last Status (if you are going to have A and B everytime you could use a pivot table) and the second one, you should get the first and last date. Something like that:

import pandas as pd
df = (
    pd.DataFrame(
        {
            "ID": [1, 1, 2, 2],
            "Status": ["A", "B", "A", "B"],
            "Date": ["01-09-2020", "03-09-2020", "10-12-2020", "11-12-2020"]
        }
    )
    .assign(Date=lambda x: pd.to_datetime(x["Date"])
)
aux1 = df.groupby("ID").agg(Status1=("Status", "first"), Status2=("Status", "last"))
aux2 = df.groupby("ID").agg(Date1=("Date", "min"), Date2=("Date", "max"))
output = pd.merge(aux1, aux2, left_index=True, right_index=True)
print(output)

Upvotes: 1

Related Questions