Reputation: 397
What kind of date format is 1037112:00:00.00
, and how do I parse it in SPSS into something recognizable?
For reference, here's a screenshot. We're dealing with V2 and V4 (and, to an extent, V3 and V5):
Edit: When I change the format of V2 in SPPS from Date to Numeric, the actual number of the highlighted cell is 3733603200
:
The file format from which I imported the data into SPSS is Excel — and although the date renders correctly in Excel (as "23-Apr"), I'm doubtful that Excel was the native format in which this data set was created. The data was exported from an online survey; I wish I had more information about it.
Here's the V2 column's values rendered in Excel (with the cells formatted as "General"):
Upvotes: 0
Views: 516
Reputation: 3672
Excel stores dates as a decimal number. The whole number part is the number of days since 1 Jan 1900
, while the fractional part is the fraction of a day since midnight (0.5 would be noon). If you were to format the cells to include the year, you’d find the highlighted cell converted to 23 Apr 2018 00:00:00
- perfect!
If you look at the raw number, it’s 3,733,603,200
. Divide this by 60
to convert seconds into minutes, then again by 60
to convert minutes into hours - and you get the exact number in your question: 1,037,112
hours, 00
minutes, and 00.0
seconds.
If you then divide this number by 24
to convert hours into days, you get (exactly) 43,213
. If you ask Excel to format a cell with this number as a full date, you’ll get the previous 23 Apr 2018
- there are 43,213 days from 1 Jan 1900 to 23 Apr 2018.
SPSS date variables are stored internally as the number of seconds since 14 Oct 1582 00:00:00
, so you’d (normally) have to do some complicated maths! Luckily, you simply need to add the number of days between the two “epochs” - 115,859
- but don’t forget to convert this to seconds! Multiply it by 24 * 60 * 60
first (10,010,217,600
).
Note that V3 and V5 are NOT 24-hour time. Some are marked as “am
”, so the 18
must be the date of the month, not the hours - also note the lack of colon!
I couldn’t help myself: I had to add this extra bit.
Why did IBM choose such a weird start epoch? 14 Oct 1582
happens to be the first day of the (then) new Gregorian calendar, instituted by Pope Gregory II. The day before was 5 Oct 1582
- accounting for those missing days makes for some horrible mathematics, so IBM just avoided it.
The missing days were to make up for all the mistaken extra days they’d added in previous centuries for Leap Years that shouldn’t have been (those years that could divide by 100, but not 400).
Upvotes: 1