Reputation: 47
This table from Wikipedia shows the 10 biggest box office hits. I can't seem to get the total of the 'worldwide_gross' column. Can someone help? Thank you.
import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]
films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)
films.worldwide_gross.sum(axis=0)
This is the output I get when I try calculating the total global earnings:
Upvotes: 1
Views: 74
Reputation: 9857
Here's one way you can do it.
This code will convert the values in the worldwide_gross
to integers and then sum the column to get the total gross.
import pandas as pd
def get_gross(gross_text):
pos = gross_text.index('$')
return int(gross_text[pos+1:].replace(',', ''))
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]
films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)
films['gross_numeric'] = films['worldwide_gross'].apply(lambda x: get_gross(x))
total_gross = films['gross_numeric'].sum()
print(f'Total gross: ${total_gross}')
Upvotes: 0
Reputation: 2243
You will have to keep only digits in column worldwide_gross using regex
and then convert the column to float using series.astype('float')
Add:
films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)
Complete Code:
import pandas as pd
boxoffice_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_highest-grossing_films')
films = boxoffice_df[1]
films.rename(columns = {'Worldwide gross(2020 $)':'worldwide_gross'}, inplace = True)
films.worldwide_gross = films.worldwide_gross.str.replace('\D',"",regex = True).astype(float)
films.worldwide_gross.sum(axis=0)
Upvotes: 0
Reputation: 61
films.astype({"worldwide_gross": int})
Total =films['worldwide_gross'].sum()
Upvotes: 1
Reputation: 14949
Total =films['worldwide_gross'].astype('Int32').sum()
or convert data-types 1st.
films = films.convert_dtypes()
Total = films['worldwide_gross'].sum()
Upvotes: 1