Reputation: 773
What I have got?
I have got the following df and a list of dates:
df
id value date
0 1 1 2020-01-01
1 1 2 2020-02-01
2 1 3 2020-03-01
3 2 3 2020-01-01
4 2 4 2020-02-01
5 2 5 2020-03-01
6 3 6 2020-01-01
7 3 7 2020-02-01
8 3 8 2020-03-01
dates = ['2020-01-01','2020-02-01','2020-03-01']
What I want to get?
I want a new df with distinct id and for each date a new column with the value:
id 2020-01-01 2020-02-01 2020-03-01
0 1 1 2 3
1 2 4 5 6
2 3 7 8 9
What I have tried?
for date in dates:
df_result[date] = df.loc[df['date'] == date, 'value']
and this what I got:
id 2020-01-01 2020-02-01 2020-03-01
0 1 1.0 NaN NaN
1 2 NaN 2.0 NaN
2 3 NaN NaN 3.0
as you can see it is wrong, any solution?
Upvotes: 2
Views: 44
Reputation: 863166
If dates are strings use Series.isin
with list for filter by boolean indexing
and then DataFrame.pivot
:
dates = ['2020-01-01','2020-02-01','2020-03-01']
df = df[df['date'].isin(dates)].pivot('id','date','value')
print (df)
date 2020-01-01 2020-02-01 2020-03-01
id
1 1 2 3
2 3 4 5
3 6 7 8
Last convert id
to column:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
id 2020-01-01 2020-02-01 2020-03-01
0 1 1 2 3
1 2 3 4 5
2 3 6 7 8
Solution with datetimes - is necessary also converting list:
df['date'] = pd.to_datetime(df['date'])
dates = ['2020-01-01','2020-02-01','2020-03-01']
df = df[df['date'].isin(pd.to_datetime(dates))].pivot('id','date','value')
print (df)
date 2020-01-01 2020-02-01 2020-03-01
id
1 1 2 3
2 3 4 5
3 6 7 8
Last convert id
to column, there are added hidden times:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
id 2020-01-01 00:00:00 2020-02-01 00:00:00 2020-03-01 00:00:00
0 1 1 2 3
1 2 3 4 5
2 3 6 7 8
For remove them is possible convert datetimes to strings:
df = (df.rename(columns = lambda x: x.strftime('%Y-%m-%d'))
.reset_index()
.rename_axis(None, axis=1))
print (df)
id 2020-01-01 2020-02-01 2020-03-01
0 1 1 2 3
1 2 3 4 5
2 3 6 7 8
Upvotes: 2