Reputation: 1143
I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
This is what I have done so far:
I have a csv file which I open in excel. I manipulate the columns algebraically to obtain a new column "A". I import the file into R using read.csv()
and the entries in column A are stored as factors - I want them to be stored as numeric. I find this question on the topic:
Imported a csv-dataset to R but the values becomes factors
Following the advice, I include stringsAsFactors = FALSE
as an argument in read.csv()
, however, as Hong Ooi suggested in the page linked above, this doesn't cause the entries in column A to be stored as numeric values.
A possible solution is to use the advice given in the following page:
How to convert a factor to an integer\numeric without a loss of information?
however, I would like a cleaner solution i.e. a way to import the file so that the entries of column entries are stored as numeric values.
Cheers for any help!
Upvotes: 48
Views: 210856
Reputation: 1
Hello @Shawn Hemelstrand here are the steps in detail below: example matrix file.csv having 'Filtered' word in it
I opened the file.csv in linux command terminal
vi file.csv then press "Esc shift:" and type the following command at the bottom "%s/Filtered/NA/g" press enter then press "Esc shift:" write "wq" at the bottom (this save the file and quit vim editor)
then in R script I read the file
data<- read.csv("file.csv", sep = ',', header = TRUE)
str(data)
All columns were num type which were earlier char type.
In case you need more help, it would be easier to share your txt or csv file.
Upvotes: 0
Reputation: 1
So, I had the similar situation here in my data file when I readin as a csv. All the numeric value were turned into char. But in my file there was a value with a word "Filtered" instead of NA. I converted "Filtered" to NA in vim editor of linux terminal with a command <%s/Filtered/NA/g> and saved this file and later used it and read it in R, all the values were num type and not char type any more. Looks like character value "Filtered" was inducing all values to be char format.
Charu
Upvotes: 0
Reputation: 1
I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.
Upvotes: 0
Reputation: 367
Including this in the read.csv
command worked for me: strip.white = TRUE
(I found this solution here.)
Upvotes: 3
Reputation: 36
version for data.table based on code from dmanuge :
convNumValues<-function(ds){
ds<-data.table(ds)
dsnum<-data.table(data.matrix(ds))
num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5})
nds <- data.table( dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]]
,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] )
return(nds)
}
Upvotes: 1
Reputation: 593
In read.table
(and its relatives) it is the na.strings
argument which specifies which strings are to be interpreted as missing values NA
. The default value is na.strings = "NA"
If missing values in an otherwise numeric variable column are coded as something else than "NA
", e.g. ".
" or "N/A
", these rows will be interpreted as character
, and then the whole column is converted to character
.
Thus, if your missing values are some else than "NA
", you need to specify them in na.strings
.
Upvotes: 14
Reputation: 206
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.
Try this instead.
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
The code does the following:
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
Upvotes: 8
Reputation: 55340
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE)
. [If that does not work, please take a look at ?read.table
(which read.csv
wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Upvotes: 40