LetsPlayYahtzee
LetsPlayYahtzee

Reputation: 7581

Trying to use map on a Spark DataFrame

I recently started experimenting with both Spark and Java. I initially went through the famous WordCountexample using RDD and everything went as expected. Now I am trying to implement my own example but using DataFrames and not RDDs.

So I am reading a dataset from a file with

DataFrame df = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("inferSchema", "true")
        .option("delimiter", ";")
        .option("header", "true")
        .load(inputFilePath);

and then I try to select a specific column and apply a simple transformation to every row like that

df = df.select("start")
        .map(text -> text + "asd");

But the compilation finds a problem with the second row which I don't fully understand (The start column is inferred as of type string).

Multiple non-overriding abstract methods found in interface scala.Function1

Why is my lambda function treated as a Scala function and what does the error message actually mean?

Upvotes: 23

Views: 83827

Answers (2)

Topde
Topde

Reputation: 581

I use concat to achieve this

df.withColumn( concat(col('start'), lit('asd'))

As you're mapping the same text twice I'm not sure if you're also looking to replace the first part of the string? but if you are, I would do:

df.withColumn('start', concat(
                      when(col('start') == 'text', lit('new'))
                      .otherwise(col('start))
                     , lit('asd')
                     )

This solution scales up when using big data, as it's concatinating two columns instead of iterating over values.

Upvotes: 5

jojo_Berlin
jojo_Berlin

Reputation: 693

If you use the selectfunction on a dataframe you get a dataframe back. Then you apply a function on the Rowdatatype not the value of the row. Afterwards you should get the value first so you should do the following:

df.select("start").map(el->el.getString(0)+"asd")

But you will get an RDD as return value not a DF

Upvotes: 21

Related Questions