R data.table: set row equal to previous row within range

Question

I am trying to clean up a dataset of crime data by giving it better categorical labels. A sample of the table looks like this:

d <- as.data.table(read.csv('[filepath]'))
print(d)

Classifications ucr_ncic_code
SOVEREIGNTY            NA
Treason                101
Treason Misprison      102
Espionage              103
Sabotage               104
Sedition               105
Selective Service      106
Sovereignty            199
MILITARY               NA
Military Desertion     201
Military               299 
IMMIGRATION            NA
Illegal Entry          301
False Citizenship      302
Smuggling Aliens       303
Immigration            399
HOMICIDE               NA
Homicide Family-Gun    901
Homicide Family-Weapon 902
Homicide Nonfam-Gun    903

As you can see, in the original dataset the broader categories of crime classifications are formatted as all-caps headers with an NA code (e.g SOVEREIGNTY NA). What I would like to do is reformat the data so that these headers are their own categorial column in the table.

Here is my initial solution, which I am almost sure is not the best approach, but produces the desired result:

d[,row.num := .I,]
d.categs <- d[toupper(substr(Classifications,1,3))==substr(Classifications,1,3)] 
#the substring is for some edge cases that I don't show here

setnames(d.categs, "Classifications", "Category")
d <- merge(d,d.categs[,row.num,list(Category)],'row.num', all.x=TRUE)
d <- d[order(row.num)]

prev.row <- NA
for (i in seq(1,d[,.N])) {
  current.row <- d$Category[i]  
  if (is.na(current.row) & !(is.na(prev.row))){
    d$Category[i] <- prev.row
  } 
  prev.row <- d$Category[i]
}

#clean up
d <- d[!(is.na(ucr_ncic_code))]
d[,row.num := NULL,]

print(d)

Classifications   ucr_ncic_code   Category
Treason                 101       SOVEREIGNTY
Treason Misprison       102       SOVEREIGNTY
Espionage               103       SOVEREIGNTY
Sabotage                104       SOVEREIGNTY
Sedition                105       SOVEREIGNTY
Selective Service       106       SOVEREIGNTY
Sovereignty             199       SOVEREIGNTY
Military Desertion      201       MILITARY
Military                299       MILITARY
Illegal Entry           301       IMMIGRATION
False Citizenship       302       IMMIGRATION
Smuggling Aliens        303       IMMIGRATION
Immigration             399       IMMIGRATION
Homicide Family-Gun     901       HOMICIDE
Homicide Family-Weapon  902       HOMICIDE
Homicide Nonfam-Gun     903       HOMICIDE

What would be a better way to utilize the data.table package to make this formatting change? I'm guessing there's a better way to copy cells down than the for-loop that I designed.

R data.table: set row equal to previous row within range

Answers (1)

Related Questions