Reputation: 3587
I have a series of CSV files where numbers are formatted in the european style using commas instead of decimal points, i.e. 0,5
instead of 0.5
.
There are too many of these files to edit them before importing to R. I was hoping there is an easy parameter for the read.csv()
function, or a method to apply to the extracted dataset in order for R to treat the data as a number rather than a string.
Upvotes: 41
Views: 118341
Reputation: 618
You can pass the decimal character as a parameter (dec = ","
):
# Semicolon as separator and comma as decimal point by default
read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",",
fill = TRUE, comment.char = "", encoding = "unknown", ...)
More info on https://r-coder.com/read-csv-r/
Upvotes: 1
Reputation: 31
can be used as follow:
mydata <- read.table(fileIn, dec=",")
input file (fileIn):
D:\TEST>more input2.txt
06-05-2014 09:19:38 3,182534 0
06-05-2014 09:19:51 4,2311 0
Upvotes: 3
Reputation: 36080
From ?read.table
:
dec the character used in the file for decimal points.
And yes, you can use that for read.csv
as well. (to me: no stupid, you cannot!)
Alternatively, you can also use
read.csv2
which assumes a "," decimal separator and a ";" for column separators.
Upvotes: 13
Reputation: 2832
Just to add to Brandon's answer above, which worked well for me (I don't have enough rep to comment):
If you're using
d$amount <- sub(",",".",d$amount)
d$amount <- as.numeric(d$amount)
don't forget that you may need sub("[.]", "", d$amount, perl=T)
to get around the .
character.
Upvotes: 0
Reputation: 21
Problems may also be solved if you indicate how your missing values are represented (na.strings=...). For example V1 and V2 here have the same format (decimals separated by "," in csv file), but since NAs are present in V1 it is interpreted as factor:
dat <- read.csv2("...csv", header=TRUE)
head(dat)
> ID x time V1 V2
> 1 1 0:01:00 0,237 0.621
> 2 1 0:02:00 0,242 0.675
> 3 1 0:03:00 0,232 0.398
dat <- read.csv2("...csv", header=TRUE, na.strings="---")
head(dat)
> ID x time V1 V2
> 1 1 0:01:00 0.237 0.621
> 2 1 0:02:00 0.242 0.675
> 3 1 0:03:00 0.232 0.398
Upvotes: 2
Reputation: 44638
read.csv(... , sep=";")
Suppose this imported field is called "amount", you can fix the type in this way if your numbers are being read in as character:
d$amount <- sub(",",".",d$amount)
d$amount <- as.numeric(d$amount)
I have this happen to me frequently along with a bunch of other little annoyances when importing from excel or excel csv. As it seems that there's no consistent way to ensure getting what you expect when you import into R, post-hoc fixes seem to be the best method. By that I mean, LOOK at what you imported - make sure it's what you expected and fix it if it's not.
Upvotes: 4
Reputation: 14450
When you check ?read.table
you will probably find all the answer that you need.
There are two issues with (continental) European csv files:
c
in csv stand for? For standard csv this is a ,
, for European csv this is a ;
sep
is the corresponding argument in read.table
.
, for European csv this is a ,
dec
is the corresponding argument in read.table
To read standard csv use read.csv
, to read European csv use read.csv2
. These two functions are just wrappers to read.table
that set the appropriate arguments.
If your file does not follow either of these standards set the arguments manually.
Upvotes: 55
Reputation: 4052
maybe
as.is=T
this also prevents to convert the character columns into factors
Upvotes: 1