llcc
llcc

Reputation: 63

function tm::tm_map encounter an error

I have a VCorpus "oanc" and I want to change all the words to lower case, so I use the following function

oanc1 <- tm_map(oanc, content_transformer(tolower))

But I got a warning:

Warning message:
In mclapply(content(x), FUN, ...) :
  scheduled cores 2 encountered errors in user code, all values of the jobs will be affected

The VCorpus "oanc" is of size 586MB while "oanc1" is only 4MB. In addition, all the contents, except the first text, are broken, and when I run

writeLines(as.character(oanc1[[2]]))

I got

Error in FUN(content(x), ...) : 
invalid input 'O<8c><be>BĭĪ<e2>=<f3><81>̡@>9<c2>Au<b7>l<99><c5>u <c4>%<a0>[,<9c><93><b8><90>w<b7><97><f7>58<e3><d7>><91><bf>"~WD<cf>2<c3><84>1GQ<dd><ed>ـ\<e2><fb><f3><d3>X]<fe>5t!<9f><89>ٍdH<e3><d6>Zu<bc><e8><b6>_RS<f0><f7><81><eb>E<f0><bd>Ԗ2o<b4>G<a7><b9><d2><fc><8a><f2><89>3<a8>ؗ<d6><c0>.w,<l<b7>}<f8>J<8f><f1><f1>����{p<94><a3>x<9e><89><da>e'<8c><ca>}y<d1><ca>V<f7>v<c3>>S^`<9e><86><f1><b1>E<b8>)<cd>ꅹ<e5><ab><<80><eb><8e>z<d0>}<a3>C<86>(%r<86><f4><e3>i*<da>i V{<94>'<f6>i<f6><a7>{dh<d0>jG۾wO<dd>?<<f7>i<c5>c<84>G<dc>3<bb>-E<e9>L<b1><b6>XG<f5>F<81><97><b1><e5><de>ln<b1><d6><f5><f6><90> DŽ<b2>/j<fc><d9>{£<83><f1><c5>;n7<bb>ɰEG<a9><b0><87>!<b5>5]9<b9><e6><fe>_Q<aa>U<a8><c0><cf>,<d9><dc>wܒ<ba>ɑ<f1>Q<c9>:r<e4><b4><ea>w<be>PCb' in 'utf8towcs'

Does any one can help me? My operating system is ubuntu 14.04LTS, and R version 3.2.0

Upvotes: 0

Views: 112

Answers (1)

Joshua Rosenberg
Joshua Rosenberg

Reputation: 4226

First, make sure the text is encoded in UTF-8 (if you can open the file in a text editor then you should be able to modify the encoding when you save it). If that doesn't fix the problem, then try adding the argument "mc.cores = 1" to the tm_map function.

Upvotes: 1

Related Questions