majom
majom

Reputation: 8021

R script file encoding (R Studio)

Which file encoding do I have to use to be able to save this vector (Matching complex URLs within text blocks (R)) correctly in a R script? The special characters and Chinese signs seem to make things somehow complicated.

x <-   c("http://foo.com/blah_blah",
        "http://foo.com/blah_blah/",
        "(Something like http://foo.com/blah_blah)",
        "http://foo.com/blah_blah_(wikipedia)",
         "http://foo.com/more_(than)_one_(parens)",
         "(Something like http://foo.com/blah_blah_(wikipedia))",
         "http://foo.com/blah_(wikipedia)#cite-1",
         "http://foo.com/blah_(wikipedia)_blah#cite-1",
         "http://foo.com/unicode_(✪)_in_parens",
         "http://foo.com/(something)?after=parens",
         "http://foo.com/blah_blah.",
         "http://foo.com/blah_blah/.",
         "<http://foo.com/blah_blah>",
         "<http://foo.com/blah_blah/>",
         "http://foo.com/blah_blah,",
         "http://www.extinguishedscholar.com/wpglob/?p=364.",
         "http://✪df.ws/1234",
         "rdar://1234",
         "rdar:/1234",
         "x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E",
         "message://%[email protected]%3e",
         "http://➡.ws/䨹",
         "www.c.ws/䨹",
         "<tag>http://example.com</tag>",
         "Just a www.example.com link.",
         "http://example.com/something?with,commas,in,url, but not at end",
         "What about <mailto:[email protected]?subject=TEST> (including brokets).",
         "mailto:[email protected]",
         "bit.ly/foo",
         "“is.gd/foo/”",
         "WWW.EXAMPLE.COM",
         "http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752",
         "http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))",
         "http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:@field(NUMBER+@band(thc+5a46634))")

I appreciate any help.

Upvotes: 1

Views: 5833

Answers (1)

user1981275
user1981275

Reputation: 13372

Running your example,

source('file.R', encoding="unknown")

works fine and saving as R object and reloading works as well:

 save(x, file='kk.Rd')
 load('kk.Rd')

You can get all different encodings with iconvlist() and test them all, for example:

vals <- lapply(iconvlist(), function(x)
                      tryCatch(source('file.R', encoding=x),                
                               error=function(e)return(NULL)))

with file.R being your script, and then

iconvlist()[which(!sapply(vals, function(x)is.null(x)))]

gives you all encodings where no error was thrown while loading.

Does this help?

Upvotes: 1

Related Questions