Reputation: 160
JavaRDD<List<String>> documents = StopWordsRemover.Execute(lemmatizedTwits).toJavaRDD().map(new Function<Row, List<String>>() {
@Override
public List<String> call(Row row) throws Exception {
List<String> document = new LinkedList<String>();
for(int i = 0; i<row.length(); i++){
document.add(row.get(i).toString());
}
return document;
}
});
I try make it with use this code, but I get WrappedArray
[[WrappedArray(happy, holiday, beth, hope, wonderful, christmas, wish, best)], [WrappedArray(light, shin, meeeeeeeee, like, diamond)]]
How make it correctly?
Upvotes: 1
Views: 3413
Reputation: 21
Here's an example with using an excel file :
JavaRDD<String> data = sc.textFile(yourPath);
String header = data.first();
JavaRDD<String> dataWithoutHeader = data.filter(line -> !line.equalsIgnoreCase(header) && !line.isEmpty());
JavaRDD<List<String>> dataAsList = dataWithoutHeader.map(line -> Arrays.asList(line.split(";")));
hope this peace of code help you
Upvotes: 2
Reputation: 330063
You can use getList
method:
Dataset<Row> lemmas = StopWordsRemover.Execute(lemmatizedTwits).select("lemmas");
JavaRDD<List<String>> documents = lemmas.toJavaRDD().map(row -> row.getList(0));
where lemmas
is the name of the column with lemmatized text. If there is only one column (it looks like this is the case) you can skip select
. If you know the index of the column you can skip select
as well and pass index to getList
but it is error prone.
Your current code iterates over the Row
not the field you're trying to extract.
Upvotes: 2