Little
Little

Reputation: 3477

how to remove non alphabetic characters and columns from an csv file

I have a csv file that looks like this:

enter image description here

And in some portions the data in the columns is like this:

enter image description here

so as you can see, and because the "=" sign is present it wants to convert it into a formula, but what I need is the word in this case "rama...

I have extracted this term from a spam file and with R converted into a sparse matrix. So the question that I have is how can I get rid of the non-alphanumeric characters from this header in R, and then convert it again into a csv file?

Thanks

Upvotes: 1

Views: 888

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521794

If you want a literal answer, you could try using gsub to replace any entry having one or more non alphanumeric characters:

df <- data.frame(v1=c(1,2,3), v2=c("#NAME?", "two", "#NAME?"),
    stringsAsFactors=FALSE)
df <- data.frame(sapply(df, function(x) gsub(".*[^A-Za-z0-9].*", "", x)))
df

  v1  v2
1  1    
2  2 two
3  3    

Demo

But the best/easiest thing to do here is probably to just fix your Excel formulas such that you catch these errors, and just display empty string, or some other sensible message. From what I can see, this is basically an Excel, not R, problem.

Upvotes: 2

Thomas Guillerme
Thomas Guillerme

Reputation: 1867

You can use gsub for that:

## A dummy matrix
example <- matrix(paste0("=", letters[1:9]),3,3)
#     [,1]  [,2]  [,3] 
#[1,] "= a" "= d" "= g"
#[2,] "= b" "= e" "= h"
#[3,] "= c" "= f" "= i"

You can remove the "=" by replacing it by "" in gsub

## Replacing the "=" by "" (nothing)
gsub("=", "", example)
#     [,1] [,2] [,3]
#[1,] "a"  "d"  "g" 
#[2,] "b"  "e"  "h" 
#[3,] "c"  "f"  "i" 

Or only in the first row (or in the column name, etc.)

## Removing the "=" in the first row
example <- gsub("=", "", example[,1])
#     [,1] [,2] [,3]
#[1,] "a"  "d"  "g" 
#[2,] "=b"  "=e"  "=h" 
#[3,] "=c"  "=f"  "=i" 

Upvotes: 0

Related Questions