Reputation: 2920
Lets assume I have the following three dataframes:
Dataframe 1:
df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1
year count
2010 1
2012 1
2014 1
2015 1
Dataframe 2:
df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2
year count
2010 2
2011 1
2016 3
2017 1
Dataframe 3:
df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3
year count
2010 4
2011 2
2012 5
2013 4
2014 4
2015 1
2017 1
Now I want to have three dataframes with all the years and counts. For example if df1
has missing years 2011, 2013, 2016, 2017 then these are added in the index of df1 with counts against each of the new added indexes as 0.
So my output would be something like this for df1:
year count
2010 1
2012 1
2014 1
2015 1
2011 0
2013 0
2016 0
2017 0
And similarly for df2 and df3 as well. Thanks.
Upvotes: 1
Views: 115
Reputation: 24593
One can also use iteration:
# find missing years:
morelist = [ j # items which satisfy following criteria
# list of all numbers converted to strings:
for j in map(lambda x: str(x), range(2010, 2018, 1))
if j not in df1.index ] # those not in current index
# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),
columns=['count'],
index=morelist)
# add new dataframe to original:
df1 = pd.concat([df1, df2add])
print(df1)
Output:
count
2010 1
2012 1
2014 1
2015 1
2011 0
2013 0
2016 0
2017 0
Upvotes: 0
Reputation: 30605
I would go with union you can also use unique i.e
idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index]))
df1.reindex(idx).fillna(0)
count
year
2010 1.0
2012 1.0
2014 1.0
2015 1.0
2011 0.0
2016 0.0
2017 0.0
2013 0.0
Upvotes: 1
Reputation: 76947
Use reindex
on all_years
like
In [257]: all_years = df1.index | df2.index | df3.index
In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
count
year
2010 1
2011 0
2012 1
2013 0
2014 1
2015 1
2016 0
2017 0
In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
count
year
2010 2
2011 1
2012 0
2013 0
2014 0
2015 0
2016 3
2017 1
Upvotes: 3
Reputation: 863226
You can use union
with reindex
:
idx = df1.index.union(df2.index).union(df3.index)
print (idx)
Index(['2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017'], dtype='object', name='year')
Another solution:
from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)
['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']
df1 = df1.reindex(idx, fill_value=0)
print (df1)
count
year
2010 1
2011 0
2012 1
2013 0
2014 1
2015 1
2016 0
2017 0
df2 = df2.reindex(idx, fill_value=0)
print (df2)
count
year
2010 2
2011 1
2012 0
2013 0
2014 0
2015 0
2016 3
2017 1
df3 = df3.reindex(idx, fill_value=0)
print (df3)
count
year
2010 4
2011 2
2012 5
2013 4
2014 4
2015 1
2016 0
2017 1
Upvotes: 3