Reputation: 1
I am trying to transfer csv data from AWS S3 bucket into BigQuery for analysis and querying. Problems that I have are that only about half of my data from the bucket is being transferred. In data transfers for BQ I had to specify an output table, and in that table I wrote in the schema for what the csv files have. One issue is that not all files in the S3 bucket have the same schema, but the schema I wrote in is a super-set of all possible columns of any given csv file. So I would think that if a file had a differing schema but it is a set of the overall- it would write in and just have null values for the missing column values. Essentially I had to raise my error tolerability count way high to even get half the data to transfer, but I need help or ideas from anyone that has been in this position to try and get the other half into the same table.
Things I tried were auto-detection of schema: this just automates the table schema part and I get the same issue. I also went down the path of creating a table based on a connection source(S3) rather than a data transfer into an already created table. Similar issues.
Upvotes: 0
Views: 43