Reputation: 90
I am trying to setup a really simple pipeline in Data Fusion which takes a table from BigQuery, then stores that data into Google Cloud Storage. With the pipeline setup below it's fairly easy. We first read the bigquery table and schema, then sink the data into a Google Cloud Storage bucket. This works, but the problem is that a new map and a new file gets created for each new transfer that I run. What I would like to do is to overwrite a single file in the same filepath with each new transfer that I do.
What I ran into that in this setup, a new map and a new file gets within Google Cloud Storage created using a timestamp prefix. Looking at the sink configuration below, indeed, by default you see a timestamp.
Alright, that would mean if I would remove the prefix a new map shouldn't be created. The hover-over confirms this: "If not specified, nothing will be appended to the path".
However, when I clear this value and then save it, the full time format automatically pops up again. I can't use a static value because this results in errors. For example I just tried creating a map with the number "12" in Google Cloud Storage and then setting the prefix to that, but as you would guess this doesn't work. Is anyone else running into this problem? How do I get rid of the path suffix so I don't get a new map for each timestamp within Google Cloud Storage?
Upvotes: 1
Views: 243
Reputation: 3500
This seems to be an issue with Data Fusion UI. Have filed a JIRA for this https://issues.cask.co/browse/CDAP-16129.
I understand this can be confusing when you open the configuration again. The reason this is happening is whenever you open the configuraion modal we pre-populate fields with default values from plugin widget json (if no value is present).
As a workaround can you try,
Export pipeline - Once you have configured all the properties in the plugins you can export the pipeline. This step should download a JSON for you where you can locate the property and remove it and import the pipeline and publish without opening the specific plugin.
Or, simply remove the property from the plugin configuration modal and close and publish the pipeline directly. UI will Re-populate the value every time you open the plugin configuration. Once you delete and close the modal it should retain that state until you open the configuration again.
Hope this helps.
Upvotes: 1