Reputation: 1437
There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:
x[is.na(x)]<-0
But, what if I want to restrict it to only certain columns? Let's me show you an example.
First, let's start with a dataset.
set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
b=sample(c(1,2,NA), 10, replace=T),
c=sample(c(1:5,NA), 10, replace=T))
Which gives:
a b c
1 1 NA 2
2 2 2 2
3 2 1 1
4 2 NA 1
5 NA 1 2
6 2 NA 5
7 1 1 4
8 1 1 NA
9 2 1 5
10 2 1 1
Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:
x[is.na(x), 1:2]<-0
and:
x[is.na(x[1:2])]<-0
Which does not work.
My data.table attempt, where y<-data.table(x)
, was obviously never going to work:
y[is.na(y[,list(a,b)]), ]
I want to pass columns inside the is.na argument but that obviously wouldn't work.
I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.
Do you have any suggestions?
Upvotes: 108
Views: 198881
Reputation: 1258
This needed a bit extra for dealing with NA's in factors.
Found a useful function here, which you can then use with mutate_at or mutate_if:
replace_factor_na <- function(x){
x <- as.character(x)
x <- if_else(is.na(x), 'NONE', x)
x <- as.factor(x)
}
df <- df %>%
mutate_at(
vars(vector_of_column_names),
replace_factor_na
)
Or apply to all factor columns:
df <- df %>%
mutate_if(is.factor, replace_factor_na)
Upvotes: 1
Reputation: 1121
it's quite handy with data.table
and stringr
library(data.table)
library(stringr)
x[, lapply(.SD, function(xx) {str_replace_na(xx, 0)})]
FYI
Upvotes: 0
Reputation: 1291
For completeness, built upon @sbha's answer, here is the tidyverse version with the across()
function that's available in dplyr since version 1.0 (which supersedes the *_at()
variants, and others):
# random data
set.seed(1234)
x <- data.frame(a = sample(c(1, 2, NA), 10, replace = T),
b = sample(c(1, 2, NA), 10, replace = T),
c = sample(c(1:5, NA), 10, replace = T))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# with the magrittr pipe
x %>% mutate(across(1:2, ~ replace_na(.x, 0)))
#> a b c
#> 1 2 2 5
#> 2 2 2 2
#> 3 1 0 5
#> 4 0 2 2
#> 5 1 2 NA
#> 6 1 2 3
#> 7 2 2 4
#> 8 2 1 4
#> 9 0 0 3
#> 10 2 0 1
# with the native pipe (since R 4.1)
x |> mutate(across(1:2, ~ replace_na(.x, 0)))
#> a b c
#> 1 2 2 5
#> 2 2 2 2
#> 3 1 0 5
#> 4 0 2 2
#> 5 1 2 NA
#> 6 1 2 3
#> 7 2 2 4
#> 8 2 1 4
#> 9 0 0 3
#> 10 2 0 1
Created on 2021-12-08 by the reprex package (v2.0.1)
Upvotes: 0
Reputation: 1340
This is now trivial in tidyr
with replace_na()
. The function appears to work for data.tables as well as data.frames:
tidyr::replace_na(x, list(a=0, b=0))
Upvotes: 21
Reputation: 49448
Since data.table
1.12.4 (Oct 2019), data.table
gains two functions to facilitate this: nafill
and setnafill
.
nafill
operates on columns:
cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]
setnafill
operates on tables (the replacements happen by-reference/in-place)
setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]
This will also be more efficient than the other options; see ?nafill
for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA
imputation for time series.
This will work for your data.table
version:
for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]
Alternatively, as David Arenburg points out below, you can use set
(side benefit - you can use it either on data.frame
or data.table
):
for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
Upvotes: 44
Reputation: 519
Starting from the data.table y, you can just write:
y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
Don't forget to library(data.table)
before creating y
and running this command.
Upvotes: 1
Reputation: 27
We can solve it in data.table
way with tidyr::repalce_na
function and lapply
library(data.table)
library(tidyr)
setDT(df)
df[,c("a","b","c"):=lapply(.SD,function(x) replace_na(x,0)),.SDcols=c("a","b","c")]
In this way, we can also solve paste columns with NA
string. First, we replace_na(x,"")
,then we can use stringr::str_c
to combine columns!
Upvotes: 1
Reputation: 10422
Building on @Robert McDonald's tidyr::replace_na()
answer, here are some dplyr
options for controlling which columns the NA
s are replaced:
library(tidyverse)
# by column type:
x %>%
mutate_if(is.numeric, ~replace_na(., 0))
# select columns defined in vars(col1, col2, ...):
x %>%
mutate_at(vars(a, b, c), ~replace_na(., 0))
# all columns:
x %>%
mutate_all(~replace_na(., 0))
Upvotes: 52
Reputation: 379
For a specific column, there is an alternative with sapply
DF <- data.frame(A = letters[1:5],
B = letters[6:10],
C = c(2, 5, NA, 8, NA))
DF_NEW <- sapply(seq(1, nrow(DF)),
function(i) ifelse(is.na(DF[i,3]) ==
TRUE,
0,
DF[i,3]))
DF[,3] <- DF_NEW
DF
Upvotes: 0
Reputation: 1
this works fine for me
DataTable DT = new DataTable();
DT = DT.AsEnumerable().Select(R =>
{
R["Campo1"] = valor;
return (R);
}).ToArray().CopyToDataTable();
Upvotes: -4
Reputation: 1095
Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:
update.mat <- function(dt, cols, criteria) {
require(data.table)
x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
y
}
To apply it:
y[update.mat(y, c("a", "b"), is.na(y))] <- 0
The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).
Upvotes: 3
Reputation: 89057
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
Upvotes: 145