Reputation: 11
url = 'https://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/SistemaPregao_excel1.asp?Data=&Mercadoria=DI1'
df_list = pd.read_html(url)
data_raw = df_list[6].copy().drop([0])
vencto_col = data_raw[0]
ajuste_col = data_raw[13]
ajuste_col.info()
ajuste_col
if we run this, in a jupyter notebook, the returns are:
<class 'pandas.core.series.Series'>
RangeIndex: 40 entries, 1 to 40
Series name: 13
Non-Null Count Dtype
-------------- -----
39 non-null object
dtypes: object(1)
memory usage: 452.0+ bytes
1 AJUSTE
2 99.544,07
3 98.486,64
4 97.485,03
5 96.492,84
6 95.411,60
7 94.337,31
8 93.469,97
9 92.381,59
10 91.537,57
11 90.516,53
12 89.588,95
So, info tells me that this values are objects but when we print it, they are values and a dataframe. What I'm missing here and how I can get numbers(float64) and a real dataframe ?
Upvotes: 0
Views: 36
Reputation: 5802
object
is just a generic type, often indicating that the column contains strings (and nothing more specific like int
, float
, datetime
etc.). You need to set the thousands
and decimal
parameters when calling read_html
so pandas can correctly parse the data, i.e.,
df_list = pd.read_html(url, thousands='.', decimal=',')
Upvotes: 1