Reputation: 651
First time using python and can't seem to figure this out. I'm scraping data from a website and it's reading it as object class even though the values are numbers. I've tried all the ways described here but keep getting errors. I want the precip column to be numeric. I keep getting the following error code: ValueError: invalid literal for int() with base 10: '4.364.36'
Script with data scraping from website
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import requests
from bs4 import BeautifulSoup
# Get URL where data we want is located
URL ="https://climate.rutgers.edu/stateclim_v1/nclimdiv/"
#Scrape data from website
result= requests.get(URL)
soup = BeautifulSoup(result.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df = pd.concat(df) #Converts list to dataframe
# Reshape data from wide to long
df= pd.melt(df, id_vars = 'Year', var_name='Month',value_name="precip")
# Get rid of missing data
df.dropna(subset=["precip","Year"], inplace=True)
# Filter dataframe to clean up for plotting
df = df[df["precip"].str.contains("M")==False]
df = df[df["Year"].str.contains("Max|Min|Count|Median|Normal|POR") == False]
Upvotes: 0
Views: 615
Reputation: 3096
So the tables from your URL are kind of funky which is why the parser is struggling. You can just copy the upper table to your clipboard (as seen in image) and use this.
df = pd.read_clipboard(header=None)
df = df.iloc[0:128, 1:]
df.columns = ['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Annual']
df = df.replace('M', 0)
for c in df.columns:
df[c] = pd.to_numeric(df[c])
print(df)
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual
0 1895 4.36 1.24 3.28 5.08 3.13 3.09 4.15 2.06 1.06 3.56 3.07 2.78 36.86
1 1896 1.61 6.88 5.65 1.35 3.54 5.49 5.38 1.68 4.25 2.41 3.12 1.21 42.57
2 1897 2.65 3.67 2.74 3.92 5.37 3.37 11.37 4.89 1.76 2.26 4.87 4.48 51.35
3 1898 4.10 3.45 3.15 3.58 6.77 2.07 4.63 5.45 2.05 5.51 6.60 3.63 50.99
4 1899 3.75 5.71 6.32 1.67 1.94 2.57 5.74 3.91 5.40 2.44 2.29 2.07 43.81
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
123 2018 2.72 6.08 4.64 4.17 5.80 3.30 5.91 5.56 7.57 4.46 8.65 5.90 64.76
124 2019 4.49 3.26 3.84 3.97 6.75 5.15 6.14 3.73 1.25 5.71 1.94 5.32 51.55
125 2020 2.29 2.79 3.61 3.98 2.47 3.05 6.69 6.09 4.41 5.03 4.09 5.35 49.85
126 2021 1.86 4.72 3.82 2.35 3.84 3.37 7.62 6.59 6.45 5.06 0.98 1.28 47.94
127 2022 3.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Upvotes: 1