fejz1234
fejz1234

Reputation: 5

Grouping a dataframe by date and also creating new column based on column values

I'm working with a dataframe using Pandas. The dataframe has this structure:

  VIN       readDate                   Type   description
------    ---------------------       -----    ----------      
C92320      2013-02-05 15:17:25.000     SV     text1          
C92320      2013-02-05 15:17:25.000     FA     text2  
C92320      2013-02-10 18:30:25.000     FA     text3       
F45910      2013-01-22 06:17:55.000     SV     text4          
F45910      2013-01-23 07:20:55.000     SV     text5          
F45910      2013-01-23 07:20:55.000     DK     text6          
J20920      2015-10-24 12:48:38.000     SV     text7         
P20824      2014-01-24 11:58:38.000     FA     text8  

For every VIN I'd like to group by the date, and then create new columns from the values in Type with the description included (removing the two old columns Type and Description). It should look like this:

  VIN       readDate                  SV     FA     DK     
------    ---------------------      ----   ----   -----    
C92320      2013-02-05 15:17:25.000  text1   text2    
C92320      2013-02-10 18:30:25.000          text3   
F45910      2013-01-22 06:17:55.000  text4    
F45910      2013-01-23 07:20:55.000  text5          text6
J20920      2015-10-24 12:48:38.000  text7             
P20824      2014-01-24 11:58:38.000          text8  

Could anyone assist me with help for this issue? Have not been able to find a solution to this, appreciate help.

Upvotes: 0

Views: 29

Answers (1)

norie
norie

Reputation: 9857

If there are no duplicate rows you could do this using pivot.

import pandas as pd

df = pd.read_csv('test.csv')

df = (
    df.pivot(index=["VIN", "readDate"], columns="Type", values="description")
    .reset_index()
    .fillna("")
)

print(df)

Type     VIN                 readDate     DK     FA     SV
0     C92320  2013-02-05 15:17:25.000         text2  text1
1     C92320  2013-02-10 18:30:25.000         text3
2     F45910  2013-01-22 06:17:55.000                text4
3     F45910  2013-01-23 07:20:55.000  text6         text5
4     J20920  2015-10-24 12:48:38.000                text7
5     P20824  2014-01-24 11:58:38.000         text8

If there are duplicate rows you could either use drop_duplicates to remove them first,

df = (
    df.drop_duplicates()
    .pivot(index=["VIN", "readDate"], columns="Type", values="description")
    .reset_index()
    .fillna("")
)

Type         VIN      readDate     DK     FA     SV
0     2013-01-22  06:17:55.000                text4
1     2013-01-23  07:20:55.000  text6         text5
2     2013-02-05  15:17:25.000         text2  text1
3     2013-02-10  18:30:25.000         text3
4     2014-01-24  11:58:38.000         text8
5     2015-10-24  12:48:38.000                text7

or use pivot_table.

df = (
    pd.pivot_table(
        df,
        values="description",
        index=["VIN", "readDate"],
        columns="Type",
        aggfunc=list,
    )
    .reset_index()
    .fillna("")
)

Type     VIN                 readDate       DK       FA              SV
0     C92320  2013-02-05 15:17:25.000           [text2]  [text1, text1]
1     C92320  2013-02-10 18:30:25.000           [text3]
2     F45910  2013-01-22 06:17:55.000                           [text4]
3     F45910  2013-01-23 07:20:55.000  [text6]                  [text5]
4     J20920  2015-10-24 12:48:38.000                           [text7]
5     P20824  2014-01-24 11:58:38.000           [text8]

Upvotes: 2

Related Questions