Reputation: 1467
I accidentally added a wrong column to my BigQuery table schema.
Instead of reloading the complete table (million of rows), I would like to know if the following is possible:
Is this functionality (or similar) supported? Possibly the "save result to table" functionality can have a "compact schema" option.
Upvotes: 35
Views: 85404
Reputation: 92
Based on the documentation, the best approach is to use the create or replace and select * EXCEPT
CREATE OR REPLACE TABLE mydataset.mytable AS (
SELECT * EXCEPT (column_to_delete) FROM mydataset.mytable
);
as with altering and drop the column, this does not free the storage of the column that was deleted immediately
Check this link for more info, it is useful to understand what is happening under the hood of a query Managing-table-schemas-BigQuery
Upvotes: 0
Reputation: 673
The smallest time-saving way to remove a column from Big Query according to the documentation.
ALTER TABLE [table_name] DROP COLUMN IF EXISTS [column_name]
Upvotes: 65
Reputation: 141
Below is the code to do it. Lets say c is the column that you wants to delete.
CREATE OR REPLACE TABLE transactions.test_table AS
SELECT * EXCEPT (c) FROM transactions.test_table;
Or second method and my favorite is by following below steps.
Upvotes: 13
Reputation: 173190
If your table does not consist of record/repeated type fields - your simple option is:
Select valid columns while filtering out bad records into new temp table
SELECT < list of original columns >
FROM YourTable
WHERE < filter to remove bad entries here >
Write above to temp table - YourTable_Temp
Make a backup copy of "broken" table - YourTable_Backup
YourTable
YourTable_Temp
to YourTable
Please note: the cost of above #1 is exactly the same as action in first bullet in your question. The rest of actions (copy) are free
In case if you have repeated/record fields - you still can execute above plan, but in #1 you will need to use some BigQuery User-Defined Functions to have proper schema in output
You can see below for examples - of course this will require some extra dev - but if you are in critical situation - this should work for you
Create a table with Record type column
create a table with a column type RECORD
I hope, at some point Google BigQuery Team will add better support for cases like yours when you need to manipulate and output repeated/record data, but for now this is a best workaround I found - at least for myself
Upvotes: 13
Reputation: 208042
Save results to table is your way to go. Try on the big table with the selected columns you are interested, and you can apply a limit to make it small.
Upvotes: 0