Pandas histograms

Question

I have emissions data from the 1990s up to 2017 and want to plot a histogram after separating the training and test sets.

The issue (I think) is that the years column is an object data type (entries for this column look like this: 1995JJ00) and I want to switch it to an int data type so I can plot a histogram with matplotlib.

The reason why I want to see the years data on a plot is to make sure that splitting the data included a reasonable spread of the years and didn't accidentally include a lot of results from similar years. Maybe this isn't even the best way to decide that, however I'm down this rabbit hole and would like to see it through.

First I removed the unwanted letters and numbers at the end by:

trainsetcopy['Perioden'] = trainsetcopy['Perioden'].map(lambda x: str(x)[:-4])

The data is from the Netherlands so 'Perioden' is years. Now I want to change the datatype of the column to enable it for plotting on a histogram. For this I tried:

trainsetcopy['Perioden'].astype(str).astype(np.int64)

and ended up with:

trainsetcopy.dtypes

ID            int64
Bronnen      object 
Perioden     object 
CO2_1         int64 
CH4_2       float64 
N2O_3       float64 
dtype: object

Which hasn't changed the datatype. How can I fix this?

UpstatePedro · Accepted Answer

I think you just need to assign the output of those dtype changes back to your DataFrame:

trainsetcopy['Perioden'] = trainsetcopy['Perioden'].astype(str).astype(np.int64)

Pandas histograms

Answers (1)

Related Questions