arrrrRgh
arrrrRgh

Reputation: 307

Replace portions of dataframe not containing certain characters

I have a dataframe (specifically a correlation matrix). I'd like to replace with NA any values in the matrix that do not have either an "*" or a "'" (i.e., omitting cells that are not statistically significant or marginally significant).

Data is something like this:

out <- data.frame(V1=c(NA,"-0.28**","-0.18'","-0.11"),
              V2=c(NA,NA,"0.01","0.05"),
              V3=c(NA,NA,NA,"0.30**"))
rownames(out) <- c("V1","V2","V3","V4")

Returning:

> out
    V1   V2     V3
V1    <NA> <NA>   <NA>
V2 -0.28** <NA>   <NA>
V3  -0.18' 0.01   <NA>
V4   -0.11 0.05 0.30**

What I'd like is the same dataframe with the non-sig or marginally sig associations replaced with NA.

Like this:

> out
    V1   V2     V3
V1    <NA> <NA>   <NA>
V2 -0.28** <NA>   <NA>
V3  -0.18' <NA>   <NA>
V4   <NA> <NA> 0.30**

Upvotes: 0

Views: 59

Answers (4)

Sven Hohenstein
Sven Hohenstein

Reputation: 81743

out[] <- lapply(out, function(x) "is.na<-"(x, grep("^[^*']+$", x)))
#         V1   V2     V3
# V1    <NA> <NA>   <NA>
# V2 -0.28** <NA>   <NA>
# V3  -0.18' <NA>   <NA>
# V4    <NA> <NA> 0.30**

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

My "SOfun" package has a function called makemeNA that can be used for this:

Usage in this case would be:

makemeNA(out, "^[0-9.-]+$", fixed = FALSE)
#         V1 V2     V3
# V1    <NA> NA   <NA>
# V2 -0.28** NA   <NA>
# V3  -0.18' NA   <NA>
# V4    <NA> NA 0.30**

This basically says to replace anything that is just a number (positive or negative) with NA.

Install the package with:

library(devtools)
install_github("mrdwab/SOfun")

Upvotes: 0

IRTFM
IRTFM

Reputation: 263481

Use negation of grepl-call. Need to use sapply because there is no grepl.data.frame method. The pattern is an OR construct with characer classes. See ?regex:

> out[ !sapply( out,grepl, patt="[']|[*]") ] <- NA
> out
        V1   V2     V3
V1    <NA> <NA>   <NA>
V2 -0.28** <NA>   <NA>
V3  -0.18' <NA>   <NA>
V4    <NA> <NA> 0.30**

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99371

You could also do

out[] <- lapply(out, function(x) { is.na(x) <- !grepl("[*']", x); x })
out
#         V1   V2     V3
# V1    <NA> <NA>   <NA>
# V2 -0.28** <NA>   <NA>
# V3  -0.18' <NA>   <NA>
# V4    <NA> <NA> 0.30**

Upvotes: 0

Related Questions