IVy
IVy

Reputation: 119

Set attributes to multiple data.table columns in R

This should be something easy but I cannot seem to get it right.

I have a data table with N columns (let's say N=40K) and two character vectors with the same length (i.e. labelvector and unitvector) and I would like to add the attributes "label" and "units" to each column of the data.table to the value indicated by the relevant vectors for that column.

Both vectors are also named, using the data.table column names.

My efforts revolved around using setattr to all columns or including the .SD notation with lapply, which I use as a main workhorse when I have large tables but without any significant success.

The latter failed because I could not access the name of the column being passed to the function call from within lapply, in order to set the attributes by reference.

I can either make a function that sets the attributes by reference (having := data.table call within the function body) or an *apply/for loop that sets them but both take a lot of time.

Do you think that this can be done faster or more elegantly?

* Edit*****

Example:

the table has 4 columns: Age, Hgt, Wgt and S

labelvector has 4 values: "Age", "Height", "Weight" and "Sex".

unitvecor has also 4 values: "Years", "cm", "kg", NA.

both labelvector and unitvector values are named with table column names.

So the goal is to set for data table:

Column Age, label: "Age", units "Years".

Column Hgt, label: "Height", units "cm".

Column Wgt, label: "Weight", units "kg".

Column S, label: "Sex", units NA.

This has to be generalized to a data.table of tens of thousands of columns.

Upvotes: 3

Views: 2764

Answers (3)

JWilliman
JWilliman

Reputation: 3883

Using for and setattr

library(data.table)  

dat <- data.table(
  Age = rnorm(10, 40, 10),
  Hgh = rnorm(10, 150, 20),
  Wgt = rnorm(10, 30, 5),
  S   = sample(c("F", "M"), size = 10,  replace = TRUE)
)

str(dat)
#> Classes 'data.table' and 'data.frame':   10 obs. of  4 variables:
#>  $ Age: num  29.2 41.2 23.3 24.6 22.9 ...
#>  $ Hgh: num  161 151 148 141 159 ...
#>  $ Wgt: num  27.8 29 37.8 33 34.2 ...
#>  $ S  : chr  "F" "M" "M" "F" ...
#>  - attr(*, ".internal.selfref")=<externalptr>

col_attr <- data.frame(
  var = c("Age", "Hgt", "Wgt", "S"),
  lab = c("Age", "Height", "Weight", "Sex"),
  unit = c("Years", "cm", "kg", "NA")
)

for(i in seq_along(col_attr$var)) {
  
  setattr(dat[[i]], name = "label", value = col_attr$lab[[i]])
  setattr(dat[[i]], name = "units", value = col_attr$unit[[i]])
}

str(dat)
#> Classes 'data.table' and 'data.frame':   10 obs. of  4 variables:
#>  $ Age: num  29.2 41.2 23.3 24.6 22.9 ...
#>   ..- attr(*, "label")= chr "Age"
#>   ..- attr(*, "units")= chr "Years"
#>  $ Hgh: num  161 151 148 141 159 ...
#>   ..- attr(*, "label")= chr "Height"
#>   ..- attr(*, "units")= chr "cm"
#>  $ Wgt: num  27.8 29 37.8 33 34.2 ...
#>   ..- attr(*, "label")= chr "Weight"
#>   ..- attr(*, "units")= chr "kg"
#>  $ S  : chr  "F" "M" "M" "F" ...
#>   ..- attr(*, "label")= chr "Sex"
#>   ..- attr(*, "units")= chr "NA"
#>  - attr(*, ".internal.selfref")=<externalptr>

Created on 2023-06-30 with reprex v2.0.2

Upvotes: 0

George Sotiropoulos
George Sotiropoulos

Reputation: 2133

I believe that is what are you looking for

mapply(setattr, x = temp_data, name = "names", value = names(temp_data), SIMPLIFY = FALSE)

Upvotes: 0

Juan David
Juan David

Reputation: 2797

This is going to fix your issue

  attr(temp_data, "names") <- c("label", "units")

Where temp_data is your data frame

Upvotes: 2

Related Questions