Reputation: 510
So I have the following DataFrame right now, with the following value:
Dataset<Row> ds = sparkSession.read().text(pathFile);
+-------+--------+
| VALUE | TIME |
+-------+--------+
| 5000 | |
+-------+--------+
where TIME doesn't have a value (or is null). How can I add a value to the TIME column? I will later on my program be adding more rows as well, and I will need to add/append values for both the VALUE and TIME columns. How can I do this?
Upvotes: 0
Views: 1192
Reputation: 74779
How can I add a value to the TIME column?
and
TIME doesn't have a value (or is null)
leads me to believe that you may want to explore na operator.
na: DataFrameNaFunctions Returns a DataFrameNaFunctions for working with missing data.
that in turn gives you the way to fill missing values.
fill(value: String, cols: Array[String]): DataFrame Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.
If you just want to replace you should use withColumn operator.
withColumn(colName: String, col: Column): DataFrame Returns a new Dataset by adding a column or replacing the existing column that has the same name.
As the value for col
you could use lit function.
lit(literal: Any): Column Creates a Column of literal value.
The other requirement was...
be adding more rows as well
That's union operator.
union(other: Dataset[T]): Dataset[T] Returns a new Dataset containing union of rows in this Dataset and another Dataset. This is equivalent to UNION ALL in SQL.
Upvotes: 2