Reputation: 21971
id source Date CVI
67092 10365 sat 2016-11-20 00:00:00 1.9
67093 10365 nel-2 2016-11-20 00:00:00 1.5
67094 10365 sat 2016-12-06 00:00:00 1.436964
67095 10365 sat 2016-12-22 00:00:00 NaN
67096 10365 sat 2017-01-07 00:00:00 4.835949
67097 10365 sat 2017-01-23 00:00:00 10.033607
... ... ... ... ...
723 84650 nel-2 2020-04-03 00:00:00 5.085851
724 84650 nel-2 2020-04-08 00:00:00 4.375207
725 84650 nel-2 2020-04-13 00:00:00 0.040688
726 84650 nel-2 2020-04-18 00:00:00 3.151000
From the dataframe above, I would like to create a new dataframe where id
for same Date
and different source
are in same row. e.g. the output should look like:
id Date sat_CVI nel-2_CVI
0 10365 2016-11-20 00:00:00 1.9 1.5
I tried using groupby but it does not do what I want. Please note that the resulting table has only those rows where the same id
has a CVI
value for the same Date
for both the source
Upvotes: 1
Views: 73
Reputation: 862731
If there is no duplicated triples id
, Date
and source
values use DataFrame.set_index
with Series.unstack
for reshape and then chnage columns names by DataFrame.add_suffix
with remove rows with missing values by DataFrame.dropna
:
df1 = (df.set_index(['id','Date','source'])['CVI']
.unstack()
.add_suffix('_CVI')
.dropna()
.reset_index()
.rename_axis(None, axis=1))
print (df1)
id Date nel-2_CVI sat_CVI
0 10365 2016-11-20 00:00:00 1.5 1.9
If possible duplicates first solution fail is necessary DataFrame.pivot_table
(it also working for no duplicated triples, then is no aggregation) with some aggregate fucntion like sum
, mean
instead set_index
with unstack
:
print (df)
id source Date CVI
67092 10365 sat 2016-11-20 00:00:00 1.900000 <- duplicated triple
67092 10365 sat 2016-11-20 00:00:00 1.000000 <- duplicated triple
67093 10365 nel-2 2016-11-20 00:00:00 1.500000
67094 10365 sat 2016-12-06 00:00:00 1.436964
67095 10365 sat 2016-12-22 00:00:00 NaN
67096 10365 sat 2017-01-07 00:00:00 4.835949
67097 10365 sat 2017-01-23 00:00:00 10.033607
723 84650 nel-2 2020-04-03 00:00:00 5.085851
724 84650 nel-2 2020-04-08 00:00:00 4.375207
725 84650 nel-2 2020-04-13 00:00:00 0.040688
726 84650 nel-2 2020-04-18 00:00:00 3.151000
df1 = (df.pivot_table(index=['id','Date'],columns='source', values='CVI', aggfunc='sum')
.add_suffix('_CVI')
.dropna()
.reset_index()
.rename_axis(None, axis=1))
print (df1)
id Date nel-2_CVI sat_CVI
0 10365 2016-11-20 00:00:00 1.5 2.9
Upvotes: 1