Sissi
Sissi

Reputation: 77

Filling missing dates within group with duplicate date pandas python

I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"

These are two codes I have tried:

FIRST:

udates=data['day'].unique()
filled_df = (data.set_index('day')
             .groupby('ID')
             .apply(lambda d: d.reindex(pd.date_range(min(data.day),
                                                      max(data.day),
                                                      freq='D')))
             .drop('ID', axis=1)
             .reset_index('ID')
             .fillna(0))
filled_df

SECOND ONE:

users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)

df = pd.DataFrame(index=dates)
for u in users:
    df[u] = data[data.ID==u].val

df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df

this is an extract of the input data:

day ID val
01/26/2020 AA 100
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/29/2020 BB 40

This the output I want to get:

day ID val
01/26/2020 AA 100
01/27/2020 AA 0
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/28/2020 BB 0
01/29/2020 BB 40

Thank you

EDIT


df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'], 
                   'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})

Upvotes: 0

Views: 567

Answers (1)

Corralien
Corralien

Reputation: 120391

>>> df.set_index("day") \
      .groupby("ID")["val"] \
      .resample("D") \
      .first() \
      .fillna(0) \
      .reset_index()

   ID        day    val
0  AA 2020-01-26  100.0
1  AA 2020-01-27    0.0
2  AA 2020-01-28  200.0
3  BB 2020-01-26  100.0
4  BB 2020-01-27  100.0
5  BB 2020-01-28    0.0
6  BB 2020-01-29   40.0

Note: the function first() is useless. It's because Resampler.fillna() only works with the method keyword. You cannot pass a value unlike DataFrame.fillna().

Upvotes: 1

Related Questions