Reputation: 33
I have a question about referencing objects by retrieving the name of the object from a variable.
SETUP
library(data.table)
object <- c("one", "two", "three")
attributes <- c("green, blue, red", "red", "blue, orange")
DT <- data.table(object,attributes) ; DT
object attributes
1: one green, blue, red
2: two red
3: three blue, orange
This is the base setup I have (simplified data). I have objects with names and each has attributes assigned. The attributes are in the original dataset as comma delimited strings in a single cell of the table. The attributes come from a finite, knowable list of attributes. In this example I use colors. I need to be able to find objects by attribute. So, subset out all the object with "red" as an attribute. (in the real world example there are 20k objects and ~200 attributes) What I want, after receiving the raw data and creating a data.table, is to create flag columns for all the possible attributes to facilitate searches/sub-setting. So this…
DT[, isRed := FALSE]
DT[, isGreen := FALSE]
DT[, isBlue := FALSE]
DT[, isOrange := FALSE]
DT
object attributes isRed isGreen isBlue isOrange
1: one green, blue, red FALSE FALSE FALSE FALSE
2: two red FALSE FALSE FALSE FALSE
3: three blue, orange FALSE FALSE FALSE FALSE
This creates my baseline data.table, all the flag columns are in place and set FALSE prior to processing.
The processing is to take the attribute string, parse out the individual attributes, and set the flag accordingly. This is what I am doing…
# take the first object, parse the attributes into a data.table
split.attributes <-
str_split(DT[object == "one", attributes], ",", n = Inf) %>%
transpose() %>%
data.table()
split.attributes
.
1: green
2: blue
3: red
# format the attributes with initial Uppercase, and update the data.table
# ignore the extraneous string manipulates (like "\\s") in the real world example
# the attributes are sometimes two word strings that are then a
# single flag name, i.e., "blue green" -> "BlueGreen"
split.attributes <- split.attributes[,.] %>%
str_to_title() %>%
str_remove("\\s") %>%
as.list() %>%
data.table()
split.attributes
.
1: Green
2: Blue
3: Red
I already have all the flag column names in the form of "is", i.e, "isRed", so convert the data.table…
# paste "is" in front of the attribute and change the column name to avoid referring to "." later
split.attributes[, col.names := paste0("is",.)]
split.attributes
. col.names
1: Green isGreen
2: Blue isBlue
3: Red isRed
# then remove the extraneous column
split.attributes[, . := NULL]
split.attributes
col.names
1: isGreen
2: isBlue
3: isRed
I now have a set of flag names (that match the actual column names) for the first object in my original data table and I want to assign new values (TRUE) to those flags. What I want to do is call the value from split.attributes[1] and use it as the name of a column in DT. I know one way to do this…
eval(parse( text = (paste0("DT[1, ", eval(split.attributes[i]), " := TRUE]"))))
DT
object attributes isRed isGreen isBlue isOrange
1: one green, blue, red FALSE TRUE FALSE FALSE
2: two red FALSE FALSE FALSE FALSE
3: three blue, orange FALSE FALSE FALSE FALSE
And my one flag is now TRUE, so we know object::one is "isGreen"::TRUE. Of course with looping I can set all necessary flags for all objects. I have seen a lot of specific solutions, but they all follow the basic idea of; turn the variable into a string, concatenate that string with the other strings necessary to build your expression, and then evaluate the full string as an expression.
QUESTION
Is there a better way than, "eval(parse( text = (paste0("DT[1, ", eval(split.attributes[i]), " := TRUE]"))))"?
In my mind this is a common problem (or maybe my personal project is unique in this regard), so I feel like you should be able to do something like;
DT[1, ", get_the_variable_value_and_add_as_part_of_a_function ( split.attributes[i] ), " := TRUE]
Which would then create the underlying expersion you want and what gets sent to R is;
DT[1, isGreen := TRUE] (as an expression to be evaluated)
Nice and neat, no fuss, no layered functions.
NOTE: I realize I could make my own function for this, but what I'm asking is "does one already exist and I just haven't found it?". I'm just trying to see if anyone knows something I don't that would make my life easier. THANKS.
Upvotes: 1
Views: 73
Reputation: 33488
Here is one alternative:
DT[, attributes := strsplit(attributes, ", ")] # Convert to a list column
all_attr <- unique(unlist(DT$attributes))
DT[,
paste0("is_", all_attr) := lapply(all_attr, `%chin%`, attributes[[1]]),
by = object]
object attributes is_green is_blue is_red is_orange
1: one green,blue,red TRUE TRUE TRUE FALSE
2: two red FALSE FALSE TRUE FALSE
3: three blue,orange FALSE TRUE FALSE TRUE
Another alternative:
DT[, lapply(.SD, function(x) strsplit(x, ", ")[[1]]), by = object
][, x := TRUE
][, dcast(.SD, object ~ paste0("is_", attributes), value.var = "x", fill = FALSE)]
Upvotes: 1