sunandstars
sunandstars

Reputation: 47

Summing a column in a Python dataframe

This table from Wikipedia shows the 10 biggest box office hits. I can't seem to get the total of the 'worldwide_gross' column. Can someone help? Thank you.

import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)

films.worldwide_gross.sum(axis=0)

enter image description here

This is the output I get when I try calculating the total global earnings: enter image description here

Upvotes: 1

Views: 74

Answers (4)

norie
norie

Reputation: 9857

Here's one way you can do it.

This code will convert the values in the worldwide_gross to integers and then sum the column to get the total gross.

import pandas as pd

def get_gross(gross_text):
  pos = gross_text.index('$')
  return int(gross_text[pos+1:].replace(',', ''))
  
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)

films['gross_numeric'] = films['worldwide_gross'].apply(lambda x: get_gross(x))

total_gross = films['gross_numeric'].sum()

print(f'Total gross: ${total_gross}')

Upvotes: 0

Hamza usman ghani
Hamza usman ghani

Reputation: 2243

You will have to keep only digits in column worldwide_gross using regex and then convert the column to float using series.astype('float')

Add:

films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)

Complete Code:

import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]

films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)
films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)
films.worldwide_gross.sum(axis=0)

Upvotes: 0

Pavan Yeddanapudi
Pavan Yeddanapudi

Reputation: 61

films.astype({"worldwide_gross": int})    
Total =films['worldwide_gross'].sum()

Upvotes: 1

Nk03
Nk03

Reputation: 14949

Total =films['worldwide_gross'].astype('Int32').sum()

or convert data-types 1st.

films = films.convert_dtypes()
Total = films['worldwide_gross'].sum()

Upvotes: 1

Related Questions