Dean Taler
Dean Taler

Reputation: 773

getting a new dataframe based on a exist df with values as a columns

What I have got?

I have got the following df and a list of dates:

df
   id  value        date
0   1      1  2020-01-01
1   1      2  2020-02-01
2   1      3  2020-03-01
3   2      3  2020-01-01
4   2      4  2020-02-01
5   2      5  2020-03-01
6   3      6  2020-01-01
7   3      7  2020-02-01
8   3      8  2020-03-01

dates = ['2020-01-01','2020-02-01','2020-03-01']

What I want to get?

I want a new df with distinct id and for each date a new column with the value:

   id  2020-01-01  2020-02-01  2020-03-01
0   1         1         2         3
1   2         4         5         6
2   3         7         8         9

What I have tried?

for date in dates:
    df_result[date] = df.loc[df['date'] == date, 'value']

and this what I got:


   id  2020-01-01  2020-02-01  2020-03-01
0   1         1.0         NaN         NaN
1   2         NaN         2.0         NaN
2   3         NaN         NaN         3.0

as you can see it is wrong, any solution?

Upvotes: 2

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 863166

If dates are strings use Series.isin with list for filter by boolean indexing and then DataFrame.pivot:

dates = ['2020-01-01','2020-02-01','2020-03-01']

df = df[df['date'].isin(dates)].pivot('id','date','value')
print (df)
date  2020-01-01  2020-02-01  2020-03-01
id                                      
1              1           2           3
2              3           4           5
3              6           7           8

Last convert id to column:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
   id  2020-01-01  2020-02-01  2020-03-01
0   1           1           2           3
1   2           3           4           5
2   3           6           7           8

Solution with datetimes - is necessary also converting list:

df['date'] = pd.to_datetime(df['date'])

dates = ['2020-01-01','2020-02-01','2020-03-01']

df = df[df['date'].isin(pd.to_datetime(dates))].pivot('id','date','value')
print (df)
date  2020-01-01  2020-02-01  2020-03-01
id                                      
1              1           2           3
2              3           4           5
3              6           7           8

Last convert id to column, there are added hidden times:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
   id  2020-01-01 00:00:00  2020-02-01 00:00:00  2020-03-01 00:00:00
0   1                    1                    2                    3
1   2                    3                    4                    5
2   3                    6                    7                    8

For remove them is possible convert datetimes to strings:

df = (df.rename(columns = lambda x: x.strftime('%Y-%m-%d'))
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
   id  2020-01-01  2020-02-01  2020-03-01
0   1           1           2           3
1   2           3           4           5
2   3           6           7           8

Upvotes: 2

Related Questions