Reputation: 248
I want the elements referenced in my data frame to be replaced with the argument I put into the function, however at the moment it is just replacing the elements with the argument I used to initially define the function (I'm finding it hard to explain - hopefully my code and pictures will clarify this a bit!)
Project_assign <- function(prjct) {
Truth_vector <- is.element((giraffe[,1]),(prjct[,1]))
giraffe[which(Truth_vector),5] <- 'prjct'
assign('giraffe' , giraffe , envir= .GlobalEnv)
}
Project_assign(spine_hlfs)
This mostly works however the elements get replaced with prjct instead of spine_hlfs https://i.sstatic.net/uuPnv.png
If I can get this to work as intended, then I will next create a vector with all the project names and use lapply with this function saving me a lot of manual work every few months. I am relatively new to R so any explanations are well appreciated.
Upvotes: 1
Views: 200
Reputation: 42544
As far as I have understood OP's intentions from the many comments, he wants to update the giraffe
data frame with the name of many other data frames where runkey
matches.
This can be achieved by combining the other data frames into one data.table object treating the data frame names as data and finally updating giraffe
in a join.
According to the OP, giraffe
consists of 500 rows and 5 columns including runkey
and project
. project
is initialized here as character column for the subsequent join with the data frame names.
set.seed(123L) # required for reproducible data
giraffe <- data.frame(runkey = 1:500,
X2 = sample.int(99L, 500L, TRUE),
X3 = sample.int(99L, 500L, TRUE),
X4 = sample.int(99L, 500L, TRUE),
project = "",
stringsAsFactors = FALSE)
Then there are a number of data frames which contain only one column runkey
. According to the OP, runkey
is disjunct, i.e., the combined set of all runkey
does not contain any duplicates.
spine_hlfs <- data.frame(runkey = c(1L, 498L, 5L))
ir_dia <- data.frame(runkey = c(3L, 499L, 47L, 327L))
# specify names of data frames
df_names <- c("spine_hlfs", "ir_dia")
# create named list of data frames
df_list <- mget(df_names)
# update on join
library(data.table)
setDT(giraffe)[rbindlist(df_list, idcol = "df.name"), on = "runkey", project := df.name][]
runkey X2 X3 X4 project 1: 1 2 44 63 spine_hlfs 2: 2 73 99 77 3: 3 43 20 18 ir_dia 4: 4 73 12 40 5: 5 2 25 96 spine_hlfs --- 496: 496 75 45 84 497: 497 24 63 43 498: 498 33 53 81 spine_hlfs 499: 499 1 33 16 ir_dia 500: 500 99 77 41
setDT()
coerces giraffe
to data.table
. rbindlist(df_list, idcol = "df.name")
creates a combined data.table from the list of data frames, thereby filling the df.name
column with the names of the list elements:
df.name runkey 1: spine_hlfs 1 2: spine_hlfs 498 3: spine_hlfs 5 4: ir_dia 3 5: ir_dia 499 6: ir_dia 47 7: ir_dia 327
This intermediate result is joined on runkey
with giraffe
. The project
column is updated with the contents of df.name
only for matching rows.
This is looping over df_names
and performs repeated joins which update giraffe
in place:
setDT(giraffe)
for (x in df_names) giraffe[get(x), on = "runkey", project := x]
giraffe[]
Upvotes: 0
Reputation: 50678
Sounds like a simple replace based on match
ing entries between a (list of) query dataframes and a subject dataframe.
Here is an example based on some simulated data.
I first simulate data for the subject dataframe
:
# Sample data
giraffe <- data.frame(
runkeys = seq(1:500),
col1 = runif(500),
col2 = runif(500),
col3 = runif(500),
col4 = runif(500));
I then simulate runkeys
data for 2 query dataframes
:
spine_hlfs <- data.frame(
runkeys = c(44, 260, 478));
ir_dia <- data.frame(
runkeys = c(10, 20, 30))
The query dataframes
are stored in a list
:
lst.runkeys <- list(
spine_hlfs = spine_hlfs,
ir_dia = ir_dia);
To flag runkeys
entries present in any of the query dataframes
, we can use a for
loop to match
runkeys
entries from every query dataframe
:
# This is the critical line that loops through the dataframe
# and flags runkeys in giraffe with the name of the query dataframe
for (i in 1:length(lst.runkeys)) {
giraffe[match(lst.runkeys[[i]]$runkeys, giraffe$runkeys), 5] <- names(lst.runkeys)[i];
}
This is the output of the subject dataframe
after matching runkeys
entries. I'm only showing rows where entries in column 5 where replaced.
giraffe[grep("(spine_hlfs|ir_dia)", giraffe[, 5]), ];
10 10 0.7401977 0.005703928 0.6778921 ir_dia
20 20 0.7954076 0.331462567 0.7637870 ir_dia
30 30 0.5772808 0.183716142 0.6984193 ir_dia
44 44 0.9701355 0.655736489 0.4917452 spine_hlfs
260 260 0.1893012 0.600140166 0.0390346 spine_hlfs
478 478 0.7655976 0.910946623 0.9779205 spine_hlfs
Upvotes: 1