Reputation: 18759
I encountered an unexpected issue when trying to schedule a R script with launchd
: using the Rgui or Rscript in the terminal (Mac OS X 10.7.5), the script has no issue running, but when the script is run using launchd it seems to have an encoding issue.
As an example this script to create a wordcloud from the RSS feed of the journal Le Monde:
#!/usr/bin/Rscript
require(wordcloud)
require(tm)
require(XML)
titles <- xpathSApply(htmlParse("http://www.lemonde.fr/rss/une.xml"),"//item/title",xmlValue)
titles <- gsub("[[:punct:]]"," ",titles)
rss <- Corpus(VectorSource(titles),readerControl=list(language="fr"))
rss <- tm_map(rss, stripWhitespace)
rss <- tm_map(rss, function(x)removeWords(x,stopwords("fr")))
tdm <- TermDocumentMatrix(rss)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
png("/path/to/wordcloud.png",w=5,h=5,units="in",res=100)
par(mar=c(0,0,0,0))
wordcloud(d$word,d$freq,scale=c(3,.1),min.freq=2)
dev.off()
After being given the permissions using chmod +x
, if I run the script through Rgui or through the terminal I get something like this:
But if I create a LaunchAgent to schedule this script to be run at a given time interval with a plist file like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>test</string>
<key>ProgramArguments</key>
<array>
<string>/path/to/test.R</string>
</array>
</dict>
</plist>
And then load it and kickstart it:
launchctl load ~/Library/LaunchAgents/test.plist
launchctl start test
Here is what I get:
So I guess my questions are:
- Why is that?
- How to get around that?
Edit
After @hrbrmstr comment, I inserted the line writeLines(capture.output(Sys.getenv()), con="/tmp/launchenv.txt")
inside the code.
The main difference between the contents of Sys.getenv()
is that the one corresponding to the Rgui contained different R_PLATFORM
from the two others, and R_LIBS
while the two other had a DYLD_LIBRARY_PATH
and R_DEFAULT_PACKAGES
.
The only thing in common to the Rgui and the terminal but different in the output from launchd is that the PATH contained /usr/local/bin
(which is a folder that doesn't exist on my computer, as a matter of fact) on top of everything else.
I tried nonetheless to run the script while adding these two lines in the code:
Sys.setenv(LANG='en') #language of my GUI, just in case
Sys.setenv(PATH='/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin')
but it didn't change anything.
Upvotes: 4
Views: 1223
Reputation: 18759
I think I finally solved this issue while bumping into it in a different situation.
Consider this code, similar to the one in the question, with the difference that the output is a text file:
#!/usr/bin/Rscript
require(wordcloud)
require(tm)
require(XML)
titles <- xpathSApply(htmlParse("http://www.lemonde.fr/rss/une.xml"),"//item/title",xmlValue)
titles <- gsub("[[:punct:]]"," ",titles)
rss <- Corpus(VectorSource(titles),readerControl=list(language="fr"))
rss <- tm_map(rss, stripWhitespace)
rss <- tm_map(rss, function(x)removeWords(x,stopwords("fr")))
tdm <- TermDocumentMatrix(rss)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
sink("test.txt")
for(i in d$word) cat(i,"\n")
sink()
I now work on Mac OSX 10.10 and now the issue presented in the question happens also when launching the script through the terminal and not just with launchd. The resulting test.txt
file in both case contains:
contre
crise
des
2015
2<U+00A0>milliards
<U+00A0>centre
<U+00AB><U+00A0>il
<U+00AB><U+00A0>jungle<U+00A0><U+00BB>
<U+00E9>limin<U+00E9>s
<U+00E9>lus
<U+2019>attaque
<U+2019>etat
<U+2019>europe
<U+2019>euros
<U+2019>opposition
<U+2019>union
acc<U+00E9>l<U+00E8>re
...
The issue is not, I believe, with the encoding during input but actually with the encoding during output. Here sink
uses the default encoding of the session.
> getOption("encoding")
[1] "native.enc"
The so-called 'native.enc' is given by Sys.getlocale("LC_CTYPE")
as per this comment by Brian Ripley.
When in the RGUI, my default encoding is:
> Sys.getlocale("LC_CTYPE")
[1] "en_US.UTF-8"
While the default encoding in the Rscript environment is:
$ Rscript -e 'Sys.getlocale("LC_CTYPE")'
[1] "C"
Hence the following (hacky) solution for the code in the question:
#!/usr/bin/Rscript
require(wordcloud)
require(tm)
require(XML)
Sys.setlocale("LC_CTYPE", "en_US.UTF-8") # <- Here
titles <- xpathSApply(htmlParse("http://www.lemonde.fr/rss/une.xml"),"//item/title",xmlValue)
titles <- gsub("[[:punct:]]"," ",titles)
rss <- Corpus(VectorSource(titles),readerControl=list(language="fr"))
rss <- tm_map(rss, stripWhitespace)
rss <- tm_map(rss, function(x)removeWords(x,stopwords("fr")))
tdm <- TermDocumentMatrix(rss)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
png("/path/to/wordcloud.png",w=5,h=5,units="in",res=100)
par(mar=c(0,0,0,0))
wordcloud(d$word,d$freq,scale=c(3,.1),min.freq=2)
dev.off()
Upvotes: 1