Lorcán
Lorcán

Reputation: 555

Using multiple dictionaries to populate a pandas dataframe

I currently have a dataframe and wish to input values using pairs of dictionaries.

# create count dataframe
range_of_years = range(2012, 2017)
topics = ['ecology','evolution','mathematics','biogeography','neutral theory']
topic_count_timeline = pandas.DataFrame(index = topics, columns = range_of_years)


# dictionary pair
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

I would like to iterate through the dictionary keys, selecting the dataframe cell with column and index that correspond to the keys, and then add the product of the dictionary values to that cell. Such that I would have the resultant dataframe:

               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  7.7
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     NaN  NaN  NaN  NaN  NaN
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory    3    6  3.6  2.4  3.3

I intend to update the topic_count_timeline dataframe using many pairs of dictionaries like count_dict and paper_topic_dict such that new inputs are summed with the previous incumbent of the cell instead of overwriting.

For example, if a further pair were used update the dataframe...

# Additional dictionaries
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}

The dataframe would then look like this:

               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  7.7
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     1.8  1.2    9  9.6  5.4
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory  4.2  6.8  9.6  8.8  8.5

Thank you.

Upvotes: 1

Views: 1261

Answers (3)

jezrael
jezrael

Reputation: 862511

I believe need:

for k, v in paper_topics_dict.items():
    topic_count_timeline.loc[k] = v

for k, v in count_dict.items():
    topic_count_timeline[k] *= v

print (topic_count_timeline)
               2012 2013 2014 2015 2016
ecology           7   14  8.4  5.6  6.3
evolution       NaN  NaN  NaN  NaN  NaN
mathematics     NaN  NaN  NaN  NaN  NaN
biogeography    NaN  NaN  NaN  NaN  NaN
neutral theory    3    6  3.6  2.4  2.7

But if working with dictionary pairs better is for each pair update defaultdict, then convert it to Series, for DataFrame add unstack and reindex for missing columns and index values:

from collections import defaultdict

count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}

L = [(count_dict, paper_topics_dict), (count_dict2, paper_topics_dict2)]

d = defaultdict(float)
for a, b in L:
    for k, v in b.items():
        for k2, v2 in a.items():
            d[(k, k2)] += v*v2

df = pd.Series(d).unstack().reindex(index=topics, columns=range_of_years)
print (df)
                2012  2013  2014  2015  2016
ecology          7.0  14.0   8.4   5.6   6.3
evolution        NaN   NaN   NaN   NaN   NaN
mathematics      1.8   1.2   9.0   9.6   7.8
biogeography     NaN   NaN   NaN   NaN   NaN
neutral theory   4.2   6.8   9.6   8.8   7.9

Upvotes: 2

BENY
BENY

Reputation: 323226

You can using combine_first and create a new df for your dict

topic_count_timeline.combine_first(pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys()))
Out[683]: 
                2012  2013  2014  2015  2016
biogeography     NaN   NaN   NaN   NaN   NaN
ecology          7.0  14.0   8.4   5.6   6.3
evolution        NaN   NaN   NaN   NaN   NaN
mathematics      NaN   NaN   NaN   NaN   NaN
neutral theory   3.0   6.0   3.6   2.4   2.7

More info

pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys())
Out[684]: 
                2012  2013  2014  2015  2016
ecology          7.0  14.0   8.4   5.6   6.3
neutral theory   3.0   6.0   3.6   2.4   2.7

Upvotes: 2

jpp
jpp

Reputation: 164623

I would use a function for this and pd.DataFrame.pipe.

You can then use the pipe syntax for subsequent dictionaries.

def update_data(df, counts, topics):
    for k, v in topics.items():
        for k2, v2 in counts.items():
            df.loc[k, k2] = v*v2
    return df

count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}

df = df.pipe(update_data, count_dict, paper_topics_dict)

print(df)

#                2012 2013 2014 2015 2016
# ecology           7   14  8.4  5.6  6.3
# evolution       NaN  NaN  NaN  NaN  NaN
# mathematics     NaN  NaN  NaN  NaN  NaN
# biogeography    NaN  NaN  NaN  NaN  NaN
# neutral theory    3    6  3.6  2.4  2.7

Upvotes: 1

Related Questions