el323
el323

Reputation: 2920

Map values of multiple dataframes and fill columns

Lets assume I have the following three dataframes:

Dataframe 1:

df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1

year    count
2010    1
2012    1
2014    1
2015    1

Dataframe 2:

df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2

year    count
2010    2
2011    1
2016    3
2017    1

Dataframe 3:

df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3

year    count
2010    4
2011    2
2012    5
2013    4
2014    4
2015    1
2017    1

Now I want to have three dataframes with all the years and counts. For example if df1 has missing years 2011, 2013, 2016, 2017 then these are added in the index of df1 with counts against each of the new added indexes as 0.

So my output would be something like this for df1:

year    count
2010    1
2012    1
2014    1
2015    1
2011    0
2013    0
2016    0
2017    0

And similarly for df2 and df3 as well. Thanks.

Upvotes: 1

Views: 115

Answers (4)

rnso
rnso

Reputation: 24593

One can also use iteration:

# find missing years:
morelist = [ j            # items which satisfy following criteria
             # list of all numbers converted to strings:
             for j in map(lambda x: str(x), range(2010, 2018, 1))
             if  j not in df1.index  ]      # those not in current index

# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),   
                      columns=['count'], 
                      index=morelist)

# add new dataframe to original:
df1 = pd.concat([df1, df2add]) 

print(df1)

Output:

      count
2010      1
2012      1
2014      1
2015      1
2011      0
2013      0
2016      0
2017      0

Upvotes: 0

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

I would go with union you can also use unique i.e

idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index])) 
df1.reindex(idx).fillna(0)

      count
year       
2010    1.0
2012    1.0
2014    1.0
2015    1.0
2011    0.0
2016    0.0
2017    0.0
2013    0.0

Upvotes: 1

Zero
Zero

Reputation: 76947

Use reindex on all_years like

In [257]: all_years = df1.index | df2.index | df3.index

In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
      count
year
2010      1
2011      0
2012      1
2013      0
2014      1
2015      1
2016      0
2017      0

In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
      count
year
2010      2
2011      1
2012      0
2013      0
2014      0
2015      0
2016      3
2017      1

Upvotes: 3

jezrael
jezrael

Reputation: 863226

You can use union with reindex:

idx = df1.index.union(df2.index).union(df3.index)
print (idx)
Index(['2010', '2011', '2012', '2013', 
       '2014', '2015', '2016', '2017'], dtype='object', name='year')

Another solution:

from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)

['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']

df1 = df1.reindex(idx, fill_value=0)
print (df1)
      count
year       
2010      1
2011      0
2012      1
2013      0
2014      1
2015      1
2016      0
2017      0
df2 = df2.reindex(idx, fill_value=0)
print (df2)
      count
year       
2010      2
2011      1
2012      0
2013      0
2014      0
2015      0
2016      3
2017      1
df3 = df3.reindex(idx, fill_value=0)
print (df3)
      count
year       
2010      4
2011      2
2012      5
2013      4
2014      4
2015      1
2016      0
2017      1

Upvotes: 3

Related Questions