Reputation: 5
I'm working with a dataframe using Pandas. The dataframe has this structure:
VIN readDate Type description
------ --------------------- ----- ----------
C92320 2013-02-05 15:17:25.000 SV text1
C92320 2013-02-05 15:17:25.000 FA text2
C92320 2013-02-10 18:30:25.000 FA text3
F45910 2013-01-22 06:17:55.000 SV text4
F45910 2013-01-23 07:20:55.000 SV text5
F45910 2013-01-23 07:20:55.000 DK text6
J20920 2015-10-24 12:48:38.000 SV text7
P20824 2014-01-24 11:58:38.000 FA text8
For every VIN I'd like to group by the date, and then create new columns from the values in Type with the description included (removing the two old columns Type and Description). It should look like this:
VIN readDate SV FA DK
------ --------------------- ---- ---- -----
C92320 2013-02-05 15:17:25.000 text1 text2
C92320 2013-02-10 18:30:25.000 text3
F45910 2013-01-22 06:17:55.000 text4
F45910 2013-01-23 07:20:55.000 text5 text6
J20920 2015-10-24 12:48:38.000 text7
P20824 2014-01-24 11:58:38.000 text8
Could anyone assist me with help for this issue? Have not been able to find a solution to this, appreciate help.
Upvotes: 0
Views: 29
Reputation: 9857
If there are no duplicate rows you could do this using pivot.
import pandas as pd
df = pd.read_csv('test.csv')
df = (
df.pivot(index=["VIN", "readDate"], columns="Type", values="description")
.reset_index()
.fillna("")
)
print(df)
Type VIN readDate DK FA SV
0 C92320 2013-02-05 15:17:25.000 text2 text1
1 C92320 2013-02-10 18:30:25.000 text3
2 F45910 2013-01-22 06:17:55.000 text4
3 F45910 2013-01-23 07:20:55.000 text6 text5
4 J20920 2015-10-24 12:48:38.000 text7
5 P20824 2014-01-24 11:58:38.000 text8
If there are duplicate rows you could either use drop_duplicates
to remove them first,
df = (
df.drop_duplicates()
.pivot(index=["VIN", "readDate"], columns="Type", values="description")
.reset_index()
.fillna("")
)
Type VIN readDate DK FA SV
0 2013-01-22 06:17:55.000 text4
1 2013-01-23 07:20:55.000 text6 text5
2 2013-02-05 15:17:25.000 text2 text1
3 2013-02-10 18:30:25.000 text3
4 2014-01-24 11:58:38.000 text8
5 2015-10-24 12:48:38.000 text7
or use pivot_table
.
df = (
pd.pivot_table(
df,
values="description",
index=["VIN", "readDate"],
columns="Type",
aggfunc=list,
)
.reset_index()
.fillna("")
)
Type VIN readDate DK FA SV
0 C92320 2013-02-05 15:17:25.000 [text2] [text1, text1]
1 C92320 2013-02-10 18:30:25.000 [text3]
2 F45910 2013-01-22 06:17:55.000 [text4]
3 F45910 2013-01-23 07:20:55.000 [text6] [text5]
4 J20920 2015-10-24 12:48:38.000 [text7]
5 P20824 2014-01-24 11:58:38.000 [text8]
Upvotes: 2