Reputation: 23
I am using the dataset- Meteorite Landings which can be found here- https://www.kaggle.com/nasa/meteorite-landings#meteorite-landings.csv
A snap of the data: https://i.sstatic.net/G3lpQ.jpg
The dataset has a 'year' column which I renamed to 'year1'
data = data.rename(columns = {"year":"year1"})
The year1 column is given:
0 01/01/1880 12:00:00 AM
1 1/1/1951 0:00
2 1/1/1952 0:00
3 1/1/1976 0:00
4 1/1/1902 0:00
...
45711 1/1/1990 0:00
45712 1/1/1999 0:00
45713 1/1/1939 0:00
45714 1/1/2003 0:00
45715 1/1/1976 0:00
Name: year1, Length: 45716, dtype: object
I want to convert this column to datetime format in order to only keep the year as the date and time are repeated values, which is of no use, moreover the column's name is 'year'.
I used this:
data['year1'] = pd.to_datetime(data['year1'])
It shows an error when I try to do so:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1583-01-01 00:00:00
In order to solve this, I tried using this :
data['year1'] = pd.to_datetime(data['year1'],errors='coerce')
but on doing so, even then the year1 column is not in datetime format.
What can I do to convert it into datetime format?
name id nametype recclass mass fall year reclat reclong GeoLocation
Aachen 1 Valid L5 21.0 Fell 1880.0 50.77500 6.08333 (50.775000, 6.083330)
Aarhus 2 Valid H6 720.0 Fell 1951.0 56.18333 10.23333 (56.183330, 10.233330)
Abee 6 Valid EH4 107000.0 Fell 1952.0 54.21667 -113.00000 (54.216670, -113.000000)
Acapulco 10 Valid Acapulcoite 1914.0 Fell 1976.0 16.88333 -99.90000 (16.883330, -99.900000)
Achiras 370 Valid L6 780.0 Fell 1902.0 -33.16667 -64.95000 (-33.166670, -64.950000)
Adhi Kot 379 Valid EH4 4239.0 Fell 1919.0 32.10000 71.80000 (32.100000, 71.800000)
Adzhi-Bogdo (stone) 390 Valid LL3-6 910.0 Fell 1949.0 44.83333 95.16667 (44.833330, 95.166670)
Agen 392 Valid H5 30000.0 Fell 1814.0 44.21667 0.61667 (44.216670, 0.616670)
Aguada 398 Valid L6 1620.0 Fell 1930.0 -31.60000 -65.23333 (-31.600000, -65.233330)
Aguila Blanca 417 Valid L 1440.0 Fell 1920.0 -30.86667 -64.55000 (-30.866670, -64.550000)
Upvotes: 0
Views: 254
Reputation: 249394
Pandas refuses to work with datetimes earlier than 1677. But no matter, because your input CSV file has the year
column as exactly that: the year alone. So just stop doing whatever you're doing that converts the year
column into datetimes, and load it as a plain integer column.
Upvotes: 1