Reputation: 1326
I have a dataset of the type 900,000 x 500, but the following shows a toy dataset apt for the question.
library(data.table)
df1 <- data.table(x = c(1,2,4,0), y = c(0,0,10,15), z = c(1,1,1,0))
I would like to do the following:
I am very new to data.table. Looking at examples of questions here at stackoverflow, I couldn't find a similar question, except this: How to replace NA values in a table *for selected columns*? data.frame, data.table
My own attempt is as follows, but this does not work:
for (col in c("x", "y")) df1[(get(col)) == 0, (col) := max(col) + 1)
Obviously, I haven't gotten accustomed to data.table
, so I'm banging my head against the wall at the moment...
If anybody could provide a dplyr
solution in addition to data.table
, I would be thankful.
Upvotes: 3
Views: 347
Reputation: 6522
The dplyr version is pretty simple (I think)
> library(dplyr)
# indented for clarity
> mutate(df1,
y= ifelse(y>0, y, max(y)+1),
z= ifelse(z>0, z, max(z)+1))
x y z
1 1 16 1
2 2 16 1
3 4 10 1
4 0 15 2
EDIT As noted by David Arenburg in comments this is helpful for the toy example but not for the data mentione dwith 500 columns. He suggests something similar to:
df1 %>% mutate_each(funs(ifelse(. > 0, ., max(.) + 1)), -1)
where -1
specifies all but the first column
Upvotes: 2
Reputation: 20080
As an alternative, ifelse(test, yes, no)
might be useful
Along the lines
library(data.table)
dt <- data.table(x = c(1,2,4,0), y = c(0,0,10,15), z = c(1,1,1,0))
print(dt)
dt[, y := ifelse(!y, max(y) + 1, y)]
print(dt)
Upvotes: 1
Reputation: 887118
We can use set
and assign the rows where the value is 0 with the max
of that column +1.
for(j in c("y", "z")){
set(df1, i= which(!df1[[j]]), j=j, value= max(df1[[j]])+1)
}
df1
# x y z
#1: 1 16 1
#2: 2 16 1
#3: 4 10 1
#4: 0 15 2
NOTE: The set
method will be very efficient as the overhead of [.data.table
is avoided
Or a less efficient method would be to specify the columns of interest in .SDcols
, loop through the columns (lapply(..
), replace
the value based on the logical index, and assign (:=
) the output back to the columns.
df1[, c('y', 'z') := lapply(.SD, function(x)
replace(x, !x, max(x)+1)), .SDcols= y:z]
Upvotes: 6