Reputation: 2333
I have two dataframes:
df:
Conference Year SampleCitations Percent
0 CIKM 1995 373 0.027153
1 CIKM 1996 242 0.017617
2 CIKM 1997 314 0.022858
3 CIKM 1998 427 0.031084
And another dataframe which returns to me the total number of citations:
allcitations= pd.read_sql("Select Sum(Citations) as ActualCitations from publications "
I want to simply multiply the Percent
column in dataframe df with the constant value ActualCitations
.
I tried the following:
df['ActualCitations']=df['Percent'].multiply(allcitations['ActualCitations'])
and
df['ActualCitations']=df['Percent']* allcitations['ActualCitations']
But both only perform it for the first row and the rest is Naan, as shown below:
Conference Year SampleCitations Percent ActualCitations
0 CIKM 1995 373 0.027153 1485.374682
1 CIKM 1996 242 0.017617 NaN
2 CIKM 1997 314 0.022858 NaN
3 CIKM 1998 427 0.031084 NaN
Upvotes: 3
Views: 6752
Reputation: 30424
The problem in this case is pandas's auto alignment (ususally a good thing). Because your 'constant' is actually in a dataframe, what pandas will try to do is create row 0 from each of the row 0s and then row 1 from each of the row 1s, but there is no row 1 in the second dataset, so you get NaN from there forward.
So what you need to do intentionally break the dataframe aspect of the second dataframe so that pandas will then 'broadcast' the constant to ALL rows. One way to do this is with values
, which in this case essentially just drops the index from a dataframe so that it becomes a numpy array with one element (really a scalar, but contained in a numpy array technically). to_list()
will also accomplish the same thing.
allcitations=pd.DataFrame({ 'ActualCitations':[54703.888410120424] })
df['Percent'] * allcitations['ActualCitations'].values
0 1485.374682
1 963.718402
2 1250.421481
3 1700.415667
Upvotes: 1