Ruslan
Ruslan

Reputation: 423

How to merge dataframe and date_range Series?

I have a dataframe with user transactions:

date       amount
2019-11-25 100
2019-11-25 40
2019-11-23 44
2019-10-30 1000

Date column has gaps. This makes time-serier plottng a bit weird. In order to fill the gaps I've created Series:

allthosedays = pd.DataFrame({
    'date': pd.date_range(
        start = pd.Timestamp(df.date.min()),                        
        end = pd.Timestamp(df.date.max()),
        freq = 'D'
    )
})

And then I got stuck.

How can I merge my Dataframe and Series. And fill non-existing Amount values with zeros?

Or maybe I do everything wrong and problem solves without creating a Series?

Upvotes: 1

Views: 761

Answers (1)

jezrael
jezrael

Reputation: 862441

This makes time-serier plottng a bit weird.

I think one reason is duplicated DatetimeIndex value(s) 2019-11-25, so it should be problem.

One possible solution is use sum per datetimes for unique values with aggregation, e.g. sum and then for add another values (if necessary) is possible use DataFrame.asfreq:

df1 = df.set_index('date').sum(level=0).sort_index()
print (df1)
            amount
date              
2019-10-30    1000
2019-11-23      44
2019-11-25     140

df2 = df.set_index('date').sum(level=0).sort_index().asfreq('D', fill_value=0)
print (df2)
            amount
date              
2019-10-30    1000
2019-10-31       0
2019-11-01       0
2019-11-02       0
2019-11-03       0
2019-11-04       0
2019-11-05       0
2019-11-06       0
2019-11-07       0
2019-11-08       0
2019-11-09       0
2019-11-10       0
2019-11-11       0
2019-11-12       0
2019-11-13       0
2019-11-14       0
2019-11-15       0
2019-11-16       0
2019-11-17       0
2019-11-18       0
2019-11-19       0
2019-11-20       0
2019-11-21       0
2019-11-22       0
2019-11-23      44
2019-11-24       0
2019-11-25     140

Use DataFrame.merge with left join, replace missing values and last convert to index:

df3 = allthosedays.merge(df, how='left').fillna({'amount':0}).astype({'amount':int})
print (df3)
         date  amount
0  2019-10-30    1000
1  2019-10-31       0
2  2019-11-01       0
3  2019-11-02       0
4  2019-11-03       0
5  2019-11-04       0
6  2019-11-05       0
7  2019-11-06       0
8  2019-11-07       0
9  2019-11-08       0
10 2019-11-09       0
11 2019-11-10       0
12 2019-11-11       0
13 2019-11-12       0
14 2019-11-13       0
15 2019-11-14       0
16 2019-11-15       0
17 2019-11-16       0
18 2019-11-17       0
19 2019-11-18       0
20 2019-11-19       0
21 2019-11-20       0
22 2019-11-21       0
23 2019-11-22       0
24 2019-11-23      44
25 2019-11-24       0
26 2019-11-25     100
27 2019-11-25      40

Upvotes: 1

Related Questions