Reputation: 679
I have the same problem as explain in here ,the only difference is that the CSV file contain non_english string and I couldn't find any solution for it : when I read the csv file with out encoding it gives me no error but the data changed to :
network=read.csv("graph1.csv",header=TRUE)
اشپیل(60*4)
and if I run the read.csv
with fileEncoding
it gives me this error:
network=read.csv("graph1.csv",fileEncoding="UTF-8",header=TRUE)
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'graph1.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'graph1.csv'
network[1]
[1] X.
<0 rows> (or 0-length row.names)
system info :
windows server 2008
R:R3.1.2
sample file :
node1,node2,weight
ورق800*750*6,ورق 1350*1230*6mm,0.600000024
ورق900*1200*6,ورق 1350*1230*6mm,0.600000024
ورق76*173,ورق 1350*1230*6mm,0.600000024
ورق76*345,ورق 1350*1230*6mm,0.600000024
ورق800*200*4,ورق 1350*1230*6mm,0.600000024
Upvotes: 3
Views: 7480
Reputation: 546093
The following should work – mind you, I can’t test it since I don’t have Windows (and Windows, Unicode and R simply do not mix):
x = read.csv('graph1.csv', fileEncoding = '', stringsAsFactors = TRUE)
At this point, x
is gibberish, since it was read as-is, without parsing the byte data into an encoding. We should be able to verify this:
Encoding(x[1, 1])
# [1] "unknown"
Now we tell R to treat it as UTF-8:
x = as.data.frame(lapply(x, iconv, from = 'UTF-8', to = 'UTF-8),
stringsAsFactors = FALSE)
These two steps can be compressed into one by using encoding
instead of fileEncoding
as the argument to read.csv
:
x = read.csv('graph1.csv', encoding = 'UTF-8', stringsAsFactors = TRUE)
In either case, roughly the same process takes place.
At this point, x
still appears as gibberish, since your terminal on Windows presumably does not support a Unicode code page which R understands. In fact, when running the code with a non-UTF-8 code page on Mac, I get the following output now:
x[1, 1]
# [1] "<U+0648><U+0631><U+0642>800*750*6"
However, at least the encoding is now correctly set, and the bytes are parsed:
Encoding(x[1, 1])
# [1] "UTF-8"
And if you pass the data to a device or program which speaks UTF-8, it should appear correctly. For instance, using the data as labels in a plot
command should work.
plot.new()
text(0.5, seq(0, 1, along.with = x[, 1]), x[, 1])
Upvotes: 2
Reputation: 31181
I tried with your input this:
> read.csv("graph1.csv", encoding="UTF-8")
X.U.FEFF.node1 node2 weight
1 <U+0648><U+0631><U+0642>800*750*6 <U+0648><U+0631><U+0642> 1350*1230*6mm 0.6
2 <U+0648><U+0631><U+0642>900*1200*6 <U+0648><U+0631><U+0642> 1350*1230*6mm 0.6
3 <U+0648><U+0631><U+0642>76*173 <U+0648><U+0631><U+0642> 1350*1230*6mm 0.6
4 <U+0648><U+0631><U+0642>76*345 <U+0648><U+0631><U+0642> 1350*1230*6mm 0.6
5 <U+0648><U+0631><U+0642>800*200*4 <U+0648><U+0631><U+0642> 1350*1230*6mm 0.6
Upvotes: 2