Reputation: 75
new user to R so please go easy on me.
I have dataframe like:
df = data.frame(Mineral = c("Zfeldspar", "Zgranite", "ZSilica"),
Confidence = c("ZLow", "High", "Med"),
Coverage = c("sub", "sub", "super"),
Aspect = c("ZPos", "ZUnd", "Neg"))
actual file is much larger and outputted from old hardware. For some reason some entries have "Z" put in front of them. How do I remove from entire dataset?
I tried df = gsub("Z", " ", df)
but it just gives me nonsense. This darn thing!
[1] "1:3" "c(3, 1, 2)" "c(1, 1, 2)" "c(2, 3, 1)"
Looked on here at stackoverflow and tried stringr package but could also not get to work. Anyone know what to do?
Upvotes: 2
Views: 2493
Reputation: 2796
Your approach with gsub()
is not working because that function operates on vectors, and not dataframes. However, you can apply
gsub()
over each column of your dataframe to get what you want:
df[] <- lapply(df, function (x) {gsub("Z", "", x)})
For a stringr
solution (that also uses dplyr
), try:
library(tidyverse)
df <- mutate_all(df,
funs(str_replace_all(., "Z", "")))
P.S. I recommend using df <-
instead of df =
in the future. Good luck!
EDIT: corrected typo - thanks @thelatemail
Upvotes: 4
Reputation: 33940
You asked how to do it in stringr(/stringi) package, to avoid getting the unwanted vector of indices you got:
> as.data.frame(apply(df, 2,
function(col) stringr::str_replace_all(col, '^Z', '')))
> as.data.frame(apply(df, 2,
function(col) stringi::stri_replace_first_regex(col, '^Z', '')))
Mineral Confidence Coverage Aspect
1 feldspar Low sub Pos
2 granite High sub Und
3 Silica Med super Neg
(where the as.data.frame()
call is needed to turn the output array back into a df R: apply-like function that returns a data frame?
)
As to figuring out how exactly to call str*_replace
function over an entire dataframe, I tried...
stri_replace_first_fixed(df, '^Z', '')
stri_replace_first_fixed(df[1,], '^Z', '')
stri_replace_first_fixed(df[,1], '^Z', '')
Only the last one works properly. Admittedly a design flaw on str*_replace
, they should at minimum recognize an invalid object and produce a useful error message, instead of spewing out indices.
Upvotes: 0
Reputation: 626748
You may use a simple ^Z
regex in the following way:
df = data.frame(Mineral = c("Zfeldspar", "Zgranite", "ZSilica"),
Confidence = c("ZLow", "High", "Med"),
Coverage = c("sub", "sub", "super"),
Aspect = c("ZPos", "ZUnd", "Neg"))
df[] <- lapply(df, sub, pattern = '^Z', replacement ="")
> df
Mineral Confidence Coverage Aspect
1 feldspar Low sub Pos
2 granite High sub Und
3 Silica Med super Neg
The ^Z
pattern matches the start of the string with ^
anchor, and then Z
is matched and removed using sub
(as there is only one possible match in the each string there is no point using gsub
).
Upvotes: 1
Reputation: 6132
You could do:
as.data.frame(sapply(data, function(x) {gsub("Z", "", x)}))
Upvotes: 0
Reputation: 4220
You are close. If you want to go with base gsub
data$Mineral = gsub("Z", "", data$Mineral)
You can do this for all columns. Or use a combination of apply
strategies (see other answers!)
PS. Naming your data data
is not a good idea. At least do my_data
Upvotes: 0