Louise Sørensen
Louise Sørensen

Reputation: 257

Calculate age from personal number when birth year only has two numbers

I know there is A LOT of questions resembling this. BUT I'm not asking the same!

My problem is that all of the questions which I've looked at has birthdays with the whole year, fx 04/05/1971 (format: %d/%m/%Y).

The birthdays in my data is Danish CPR-numbers (personal identification numbers), and they look like this:

   ID
1901912222
0110841111
0404143333
1602032444

NB: Those dates are examples. I have thousands of rows and it's people in all ages, also above 100 (but most often no more than 17).

1st and 2nd number: day of birth 3rd and 4th number: month of birth 5th and 6th number: year of birth The last four = Sequential number.

So this gives me the birthdays (and ages):

   ID         birthdate      age
1901912222    19/09/91        26
0110841111    01/10/84        33
0404143333    04/04/14        103
1602024444    16/02/02        15

So the format is: %d%m%y[the sequential number of 4 digits]

So the last four digits (the sequential number) also has some information. They tell if the person is fx 3 or 103 years old (now that I don't have the year). See the image for description:

Birth year and sequential number

I don't know if it's any help, but I have the Excel code:

=YEAR(NOW())-1-IF(DATE(YEAR(NOW());MID(D12;3;2);LEFT(D12;2))<=NOW();MID(D12;5;2)+IF(LEFT(RIGHT(D12;4);1)*1<=3;1900;IF(AND(LEFT(RIGHT(D12;4);1)*1=4;MID(D12;5;2)*1<=36);2000;IF(AND(LEFT(RIGHT(D12;4);1)*1=4;MID(D12;5;2)*1>=37);1900;IF(AND(LEFT(RIGHT(D12;4);1)*1>=5;LEFT(RIGHT(D12;4);1)*1<=8;MID(D12;5;2)*1<=57);2000;IF(AND(LEFT(RIGHT(D12;4);1)*1>=5;LEFT(RIGHT(D12;4);1)*1<=8;MID(D12;5;2)*1>=58);1800;IF(AND(LEFT(RIGHT(D12;4);1)*1=9;MID(D12;5;2)*1<=36);2000+MID(D12;5;2);1900))))))-1;MID(D12;5;2)+IF(LEFT(RIGHT(D12;4);1)*1<=3;1900;IF(AND(LEFT(RIGHT(D12;4);1)*1=4;MID(D12;5;2)*1<=36);2000;IF(AND(LEFT(RIGHT(D12;4);1)*1=4;MID(D12;5;2)*1>=37);1900;IF(AND(LEFT(RIGHT(D12;4);1)*1>=5;LEFT(RIGHT(D12;4);1)*1<=8;MID(D12;5;2)*1<=57);2000;IF(AND(LEFT(RIGHT(D12;4);1)*1>=5;LEFT(RIGHT(D12;4);1)*1<=8;MID(D12;5;2)*1>=58);1800;IF(AND(LEFT(RIGHT(D12;4);1)*1=9;MID(D12;5;2)*1<=36);2000+MID(D12;5;2);1900)))))))

I really do hope you can help me with this problem!

Upvotes: 5

Views: 1446

Answers (1)

John Coleman
John Coleman

Reputation: 52008

The hard part is extracting the actual birth date from the id. The following function does so by creating three arrays to look up either "19" or "20" depending on whether the the year is 00-36, 37-57, or 58-99. It returns the dates in the standard format "yyyy-mm-dd":

A <- c(rep("19",4),rep("20",6))
B <- c(rep("19",5),rep("20",4),"19")
C <- c(rep("19",5),rep("18",4),"19")
birthday <- function(code){
  day <- substr(code,1,2)
  month <- substr(code,3,4)
  year <- substr(code,5,6)
  snum <- 1+as.numeric(substr(code,7,7))
  prefix <- ifelse(as.numeric(year) <= 36,A[snum],ifelse(as.numeric(year)<=57,B[snum],C[snum]))
  year <- paste0(prefix,year)
  paste(year,month,day,sep = "-")
}

For example:

df <- data.frame(ID = c("1901912222","0110841111","0404143333","1602024444"))
df$BD <- birthday(df$ID)

Yielding:

          ID         BD
1 1901912222 1991-01-19
2 0110841111 1984-10-01
3 0404143333 1914-04-04
4 1602024444 2002-02-16

Once you have the birthday in standard 4-digit year format, it is easy enough to e.g. calculate age. See this question.

Upvotes: 4

Related Questions