Reputation: 445
Following is the df.head()
of a DataSet with Date set as index.
Article_ID Country_Code Sold_Units
Date
2017-01-01 3576 AT 1
2017-01-02 1579 FR 1
2017-01-02 332 FI 2
2017-01-03 3576 AT 1
2017-01-03 332 SE 1
The Country Code has 4 values 'AT, FR, FI, SE'. I want to append these 4 country_codes to every single date if they are not present on those dates and impute their values to 0 in Article_ID and Sold_Units.
The Example output should basically look like this:
Article_ID Country_Code Sold_Units
Date
2017-01-01 3576 AT 1
2017-01-01 0 FR 0 # FR FI SE added with 0s.
2017-01-01 0 FI 0
2017-01-01 0 SE 0
2017-01-02 0 AT 0 # AT, SE added
2017-01-02 1579 FR 1
2017-01-02 332 FI 2
2017-01-02 0 SE 0
2017-01-03 3576 AT 1
2017-01-03 0 FR 0 # FR, FI added
2017-01-03 0 FI 0
2017-01-03 332 SE 1
How can I add such default values for every country code?
Upvotes: 1
Views: 117
Reputation: 22493
Another solution using pivot_table
:
import pandas as pd
df = pd.DataFrame({'Date': {0: '2017-01-01', 1: '2017-01-02', 2: '2017-01-02', 3: '2017-01-03', 4: '2017-01-03'},
'Article_ID': {0: 3576, 1: 1579, 2: 332, 3: 3576, 4: 332},
'Country_Code': {0: 'AT', 1: 'FR', 2: 'FI', 3: 'AT', 4: 'SE'},
'Sold_Units': {0: 1, 1: 1, 2: 2, 3: 1, 4: 1}})
df.set_index("Date",inplace=True)
result = (df.pivot_table(index="Date", columns="Country_Code", values=["Article_ID","Sold_Units"],aggfunc="mean")
.fillna("0").stack().reset_index(level=1))
print (result)
#
Country_Code Article_ID Sold_Units
Date
2017-01-01 AT 3576 1
2017-01-01 FI 0 0
2017-01-01 FR 0 0
2017-01-01 SE 0 0
2017-01-02 AT 0 0
2017-01-02 FI 332 2
2017-01-02 FR 1579 1
2017-01-02 SE 0 0
2017-01-03 AT 3576 1
2017-01-03 FI 0 0
2017-01-03 FR 0 0
2017-01-03 SE 332 1
Upvotes: 0
Reputation: 862521
Solutions for unique DatetimeIndex
with Country_Code
combinations:
Create MultiIndex
by added Country_Code
to DatetimeIndex
with all combinations of unique values of datetimes with codes with DataFrame.reindex
:
df = (df.set_index('Country_Code', append=True)
.reindex(pd.MultiIndex.from_product([df.index.unique(), df['Country_Code'].unique()],
names=['Date','Country_Code']), fill_value=0)
.reset_index(level=1))
print (df)
Country_Code Article_ID Sold_Units
Date
2017-01-01 AT 3576 1
2017-01-01 FR 0 0
2017-01-01 FI 0 0
2017-01-01 SE 0 0
2017-01-02 AT 0 0
2017-01-02 FR 1579 1
2017-01-02 FI 332 2
2017-01-02 SE 0 0
2017-01-03 AT 3576 1
2017-01-03 FR 0 0
2017-01-03 FI 0 0
2017-01-03 SE 332 1
Or use DataFrame.unstack
with
DataFrame.stack
:
df = (df.set_index('Country_Code', append=True)
.unstack(fill_value=0)
.stack()
.reset_index(level=1))
print (df)
Country_Code Article_ID Sold_Units
Date
2017-01-01 AT 3576 1
2017-01-01 FI 0 0
2017-01-01 FR 0 0
2017-01-01 SE 0 0
2017-01-02 AT 0 0
2017-01-02 FI 332 2
2017-01-02 FR 1579 1
2017-01-02 SE 0 0
2017-01-03 AT 3576 1
2017-01-03 FI 0 0
2017-01-03 FR 0 0
2017-01-03 SE 332 1
Solution for multiple values per datetimes with country codes:
Error means data are like:
print (df)
Article_ID Country_Code Sold_Units
Date
2017-01-01 3576 AT 1
2017-01-02 1579 FI 1 <-FI
2017-01-02 332 FI 2 <-FI
2017-01-03 3576 AT 1
2017-01-03 332 SE 1
df = (df.groupby(['Date','Country_Code'])
.sum()
.unstack(fill_value=0)
.stack()
.reset_index(level=1))
print (df)
Country_Code Article_ID Sold_Units
Date
2017-01-01 AT 3576 1
2017-01-01 FI 0 0
2017-01-01 SE 0 0
2017-01-02 AT 0 0
2017-01-02 FI 1911 3
2017-01-02 SE 0 0
2017-01-03 AT 3576 1
2017-01-03 FI 0 0
2017-01-03 SE 332 1
Upvotes: 1