BKS
BKS

Reputation: 2333

multiply pandas dataframe column with a constant

I have two dataframes:

df:

  Conference  Year  SampleCitations   Percent  
0        CIKM  1995              373  0.027153     
1        CIKM  1996              242  0.017617        
2        CIKM  1997              314  0.022858        
3        CIKM  1998              427  0.031084        

And another dataframe which returns to me the total number of citations:

allcitations= pd.read_sql("Select Sum(Citations) as ActualCitations from publications "

I want to simply multiply the Percent column in dataframe df with the constant value ActualCitations.

I tried the following:

df['ActualCitations']=df['Percent'].multiply(allcitations['ActualCitations'])

and

df['ActualCitations']=df['Percent']* allcitations['ActualCitations']

But both only perform it for the first row and the rest is Naan, as shown below:

   Conference  Year  SampleCitations   Percent  ActualCitations
0        CIKM  1995              373  0.027153      1485.374682
1        CIKM  1996              242  0.017617              NaN
2        CIKM  1997              314  0.022858              NaN
3        CIKM  1998              427  0.031084              NaN

Upvotes: 3

Views: 6752

Answers (1)

JohnE
JohnE

Reputation: 30424

The problem in this case is pandas's auto alignment (ususally a good thing). Because your 'constant' is actually in a dataframe, what pandas will try to do is create row 0 from each of the row 0s and then row 1 from each of the row 1s, but there is no row 1 in the second dataset, so you get NaN from there forward.

So what you need to do intentionally break the dataframe aspect of the second dataframe so that pandas will then 'broadcast' the constant to ALL rows. One way to do this is with values, which in this case essentially just drops the index from a dataframe so that it becomes a numpy array with one element (really a scalar, but contained in a numpy array technically). to_list() will also accomplish the same thing.

allcitations=pd.DataFrame({ 'ActualCitations':[54703.888410120424] })

df['Percent'] * allcitations['ActualCitations'].values

0    1485.374682
1     963.718402
2    1250.421481
3    1700.415667

Upvotes: 1

Related Questions