Reputation: 77
I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"
These are two codes I have tried:
FIRST:
udates=data['day'].unique()
filled_df = (data.set_index('day')
.groupby('ID')
.apply(lambda d: d.reindex(pd.date_range(min(data.day),
max(data.day),
freq='D')))
.drop('ID', axis=1)
.reset_index('ID')
.fillna(0))
filled_df
SECOND ONE:
users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)
df = pd.DataFrame(index=dates)
for u in users:
df[u] = data[data.ID==u].val
df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df
this is an extract of the input data:
day | ID | val |
---|---|---|
01/26/2020 | AA | 100 |
01/28/2020 | AA | 200 |
01/26/2020 | BB | 100 |
01/27/2020 | BB | 100 |
01/29/2020 | BB | 40 |
This the output I want to get:
day | ID | val |
---|---|---|
01/26/2020 | AA | 100 |
01/27/2020 | AA | 0 |
01/28/2020 | AA | 200 |
01/26/2020 | BB | 100 |
01/27/2020 | BB | 100 |
01/28/2020 | BB | 0 |
01/29/2020 | BB | 40 |
Thank you
EDIT
df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'],
'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})
Upvotes: 0
Views: 567
Reputation: 120391
>>> df.set_index("day") \
.groupby("ID")["val"] \
.resample("D") \
.first() \
.fillna(0) \
.reset_index()
ID day val
0 AA 2020-01-26 100.0
1 AA 2020-01-27 0.0
2 AA 2020-01-28 200.0
3 BB 2020-01-26 100.0
4 BB 2020-01-27 100.0
5 BB 2020-01-28 0.0
6 BB 2020-01-29 40.0
Note: the function first()
is useless. It's because Resampler.fillna()
only works with the method
keyword. You cannot pass a value
unlike DataFrame.fillna()
.
Upvotes: 1