Reputation: 119
This should be something easy but I cannot seem to get it right.
I have a data table with N columns (let's say N=40K) and two character vectors with the same length (i.e. labelvector and unitvector) and I would like to add the attributes "label" and "units" to each column of the data.table to the value indicated by the relevant vectors for that column.
Both vectors are also named, using the data.table column names.
My efforts revolved around using setattr to all columns or including the .SD notation with lapply, which I use as a main workhorse when I have large tables but without any significant success.
The latter failed because I could not access the name of the column being passed to the function call from within lapply, in order to set the attributes by reference.
I can either make a function that sets the attributes by reference (having := data.table call within the function body) or an *apply/for loop that sets them but both take a lot of time.
Do you think that this can be done faster or more elegantly?
* Edit*****
Example:
the table has 4 columns: Age, Hgt, Wgt and S
labelvector has 4 values: "Age", "Height", "Weight" and "Sex".
unitvecor has also 4 values: "Years", "cm", "kg", NA.
both labelvector and unitvector values are named with table column names.
So the goal is to set for data table:
Column Age, label: "Age", units "Years".
Column Hgt, label: "Height", units "cm".
Column Wgt, label: "Weight", units "kg".
Column S, label: "Sex", units NA.
This has to be generalized to a data.table of tens of thousands of columns.
Upvotes: 3
Views: 2764
Reputation: 3883
Using for
and setattr
library(data.table)
dat <- data.table(
Age = rnorm(10, 40, 10),
Hgh = rnorm(10, 150, 20),
Wgt = rnorm(10, 30, 5),
S = sample(c("F", "M"), size = 10, replace = TRUE)
)
str(dat)
#> Classes 'data.table' and 'data.frame': 10 obs. of 4 variables:
#> $ Age: num 29.2 41.2 23.3 24.6 22.9 ...
#> $ Hgh: num 161 151 148 141 159 ...
#> $ Wgt: num 27.8 29 37.8 33 34.2 ...
#> $ S : chr "F" "M" "M" "F" ...
#> - attr(*, ".internal.selfref")=<externalptr>
col_attr <- data.frame(
var = c("Age", "Hgt", "Wgt", "S"),
lab = c("Age", "Height", "Weight", "Sex"),
unit = c("Years", "cm", "kg", "NA")
)
for(i in seq_along(col_attr$var)) {
setattr(dat[[i]], name = "label", value = col_attr$lab[[i]])
setattr(dat[[i]], name = "units", value = col_attr$unit[[i]])
}
str(dat)
#> Classes 'data.table' and 'data.frame': 10 obs. of 4 variables:
#> $ Age: num 29.2 41.2 23.3 24.6 22.9 ...
#> ..- attr(*, "label")= chr "Age"
#> ..- attr(*, "units")= chr "Years"
#> $ Hgh: num 161 151 148 141 159 ...
#> ..- attr(*, "label")= chr "Height"
#> ..- attr(*, "units")= chr "cm"
#> $ Wgt: num 27.8 29 37.8 33 34.2 ...
#> ..- attr(*, "label")= chr "Weight"
#> ..- attr(*, "units")= chr "kg"
#> $ S : chr "F" "M" "M" "F" ...
#> ..- attr(*, "label")= chr "Sex"
#> ..- attr(*, "units")= chr "NA"
#> - attr(*, ".internal.selfref")=<externalptr>
Created on 2023-06-30 with reprex v2.0.2
Upvotes: 0
Reputation: 2133
I believe that is what are you looking for
mapply(setattr, x = temp_data, name = "names", value = names(temp_data), SIMPLIFY = FALSE)
Upvotes: 0
Reputation: 2797
This is going to fix your issue
attr(temp_data, "names") <- c("label", "units")
Where temp_data is your data frame
Upvotes: 2