Reputation: 147
I need to import files that were created with SQL Server 2005 into R. I need R to read the current format or else I need a method for my data provider so that my colleague can save in a format that R can read, with csv being the first choice.
A colleague is sending me quite a few large files that have been saved with MS SQL Server 2005 on a server. I am using R 2.15.1 on Windows 7.
Using R I am trying to read in the files using standard techniques. Although each file has a csv extension, when I go to Excel or WordPad and do SAVE AS I see that it is Unicode Text. Notepad indicates that the encoding is Unicode. Right now I have to do a few things from within Excel (such as Text to Columns. Each row is entirely in Column A) and eventually save as a true csv file before I can read it into R and then use it.
Is there way to solve this from within R? I am also open to easy SQL Server 2005 solutions.
I tried the following from within R.
testDF = read.table("Info06.csv", header = TRUE, sep = ",")
testDF2 = iconv(x = testDF, from = "Unicode", to = "")
Error in iconv(x = testDF, from = "Unicode", to = "") :
unsupported conversion from 'Unicode' to '' in codepage 1252
# The next line did not produce an error message
testDF3 = iconv(x = testDF, from = "UTF-8" , to = "")
testDF3[1:6, 1:3]
Error in testDF3[1:6, 1:3] : incorrect number of dimensions
# The next line did not produce an error message
testDF4 = iconv(x = testDF, from = "macroman" , to = "")
testDF4[1:6, 1:3]
Error in testDF4[1:6, 1:3] : incorrect number of dimensions
Encoding(testDF3)
[1] "unknown"
Encoding(testDF4)
[1] "unknown"
This is the first few lines from WordPad
Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
2006-01-03 00:00:00.000,@Stock1 ,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
2006-01-03 00:00:00.000,@Stock2 ,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
Upvotes: 0
Views: 494
Reputation: 50704
It depends on your locale settings, but following works for me:
read.table("Info06.csv", header = TRUE, sep = ",", fileEncoding = "UCS-2LE")
If it won't work for you I recommend using Notepad++ to detect encoding. Open file with it and under "Encoding" menu current encoding should be marked with a dot.
Also check question about detecting encoding.
Upvotes: 1