Reputation: 3477
I have a csv file that looks like this:
And in some portions the data in the columns is like this:
so as you can see, and because the "=" sign is present it wants to convert it into a formula, but what I need is the word in this case "rama...
I have extracted this term from a spam file and with R converted into a sparse matrix. So the question that I have is how can I get rid of the non-alphanumeric characters from this header in R, and then convert it again into a csv file?
Thanks
Upvotes: 1
Views: 888
Reputation: 521794
If you want a literal answer, you could try using gsub
to replace any entry having one or more non alphanumeric characters:
df <- data.frame(v1=c(1,2,3), v2=c("#NAME?", "two", "#NAME?"),
stringsAsFactors=FALSE)
df <- data.frame(sapply(df, function(x) gsub(".*[^A-Za-z0-9].*", "", x)))
df
v1 v2
1 1
2 2 two
3 3
But the best/easiest thing to do here is probably to just fix your Excel formulas such that you catch these errors, and just display empty string, or some other sensible message. From what I can see, this is basically an Excel, not R, problem.
Upvotes: 2
Reputation: 1867
You can use gsub
for that:
## A dummy matrix
example <- matrix(paste0("=", letters[1:9]),3,3)
# [,1] [,2] [,3]
#[1,] "= a" "= d" "= g"
#[2,] "= b" "= e" "= h"
#[3,] "= c" "= f" "= i"
You can remove the "="
by replacing it by ""
in gsub
## Replacing the "=" by "" (nothing)
gsub("=", "", example)
# [,1] [,2] [,3]
#[1,] "a" "d" "g"
#[2,] "b" "e" "h"
#[3,] "c" "f" "i"
Or only in the first row (or in the column name, etc.)
## Removing the "=" in the first row
example <- gsub("=", "", example[,1])
# [,1] [,2] [,3]
#[1,] "a" "d" "g"
#[2,] "=b" "=e" "=h"
#[3,] "=c" "=f" "=i"
Upvotes: 0