Reputation: 555
I currently have a dataframe and wish to input values using pairs of dictionaries.
# create count dataframe
range_of_years = range(2012, 2017)
topics = ['ecology','evolution','mathematics','biogeography','neutral theory']
topic_count_timeline = pandas.DataFrame(index = topics, columns = range_of_years)
# dictionary pair
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
I would like to iterate through the dictionary keys, selecting the dataframe cell with column and index that correspond to the keys, and then add the product of the dictionary values to that cell. Such that I would have the resultant dataframe:
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 7.7
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
biogeography NaN NaN NaN NaN NaN
neutral theory 3 6 3.6 2.4 3.3
I intend to update the topic_count_timeline
dataframe using many pairs of dictionaries like count_dict
and paper_topic_dict
such that new inputs are summed with the previous incumbent of the cell instead of overwriting.
For example, if a further pair were used update the dataframe...
# Additional dictionaries
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}
The dataframe would then look like this:
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 7.7
evolution NaN NaN NaN NaN NaN
mathematics 1.8 1.2 9 9.6 5.4
biogeography NaN NaN NaN NaN NaN
neutral theory 4.2 6.8 9.6 8.8 8.5
Thank you.
Upvotes: 1
Views: 1261
Reputation: 862511
I believe need:
for k, v in paper_topics_dict.items():
topic_count_timeline.loc[k] = v
for k, v in count_dict.items():
topic_count_timeline[k] *= v
print (topic_count_timeline)
2012 2013 2014 2015 2016
ecology 7 14 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
biogeography NaN NaN NaN NaN NaN
neutral theory 3 6 3.6 2.4 2.7
But if working with dictionary pairs better is for each pair update defaultdict
, then convert it to Series
, for DataFrame
add unstack
and reindex
for missing columns and index values:
from collections import defaultdict
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
count_dict2 = {2012: 3, 2013: 2, 2014: 15, 2015: 16, 2016: 13}
paper_topics_dict2 = {'mathematics': 0.6, 'neutral theory': 0.4}
L = [(count_dict, paper_topics_dict), (count_dict2, paper_topics_dict2)]
d = defaultdict(float)
for a, b in L:
for k, v in b.items():
for k2, v2 in a.items():
d[(k, k2)] += v*v2
df = pd.Series(d).unstack().reindex(index=topics, columns=range_of_years)
print (df)
2012 2013 2014 2015 2016
ecology 7.0 14.0 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics 1.8 1.2 9.0 9.6 7.8
biogeography NaN NaN NaN NaN NaN
neutral theory 4.2 6.8 9.6 8.8 7.9
Upvotes: 2
Reputation: 323226
You can using combine_first
and create a new df for your dict
topic_count_timeline.combine_first(pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys()))
Out[683]:
2012 2013 2014 2015 2016
biogeography NaN NaN NaN NaN NaN
ecology 7.0 14.0 8.4 5.6 6.3
evolution NaN NaN NaN NaN NaN
mathematics NaN NaN NaN NaN NaN
neutral theory 3.0 6.0 3.6 2.4 2.7
More info
pd.DataFrame(data=np.array(list(count_dict.values()))*np.array(list(paper_topics_dict.values()))[:,None],columns=count_dict.keys(),index=paper_topics_dict.keys())
Out[684]:
2012 2013 2014 2015 2016
ecology 7.0 14.0 8.4 5.6 6.3
neutral theory 3.0 6.0 3.6 2.4 2.7
Upvotes: 2
Reputation: 164623
I would use a function for this and pd.DataFrame.pipe
.
You can then use the pipe
syntax for subsequent dictionaries.
def update_data(df, counts, topics):
for k, v in topics.items():
for k2, v2 in counts.items():
df.loc[k, k2] = v*v2
return df
count_dict = {2012: 10, 2013: 20, 2014: 12, 2015: 8, 2016: 9}
paper_topics_dict = {'ecology': 0.7, 'neutral theory': 0.3}
df = df.pipe(update_data, count_dict, paper_topics_dict)
print(df)
# 2012 2013 2014 2015 2016
# ecology 7 14 8.4 5.6 6.3
# evolution NaN NaN NaN NaN NaN
# mathematics NaN NaN NaN NaN NaN
# biogeography NaN NaN NaN NaN NaN
# neutral theory 3 6 3.6 2.4 2.7
Upvotes: 1