Reputation: 107
I have emissions data from the 1990s up to 2017 and want to plot a histogram after separating the training and test sets.
The issue (I think) is that the years column is an object data type (entries for this column look like this: 1995JJ00
) and I want to switch it to an int
data type so I can plot a histogram with matplotlib.
The reason why I want to see the years data on a plot is to make sure that splitting the data included a reasonable spread of the years and didn't accidentally include a lot of results from similar years. Maybe this isn't even the best way to decide that, however I'm down this rabbit hole and would like to see it through.
First I removed the unwanted letters and numbers at the end by:
trainsetcopy['Perioden'] = trainsetcopy['Perioden'].map(lambda x: str(x)[:-4])
The data is from the Netherlands so 'Perioden' is years. Now I want to change the datatype of the column to enable it for plotting on a histogram. For this I tried:
trainsetcopy['Perioden'].astype(str).astype(np.int64)
and ended up with:
trainsetcopy.dtypes
ID int64
Bronnen object
Perioden object
CO2_1 int64
CH4_2 float64
N2O_3 float64
dtype: object
Which hasn't changed the datatype. How can I fix this?
Upvotes: 0
Views: 196
Reputation: 460
I think you just need to assign the output of those dtype changes back to your DataFrame:
trainsetcopy['Perioden'] = trainsetcopy['Perioden'].astype(str).astype(np.int64)
Upvotes: 1