Alex O
Alex O

Reputation: 1886

Exclude a file from merging to the main branch

The environment:

We use Bitbucket Cloud for repository management (and merges) and Jenkins for CI/CD. Branches with open pull requests are built and deployed to a temporary environment (which includes a taking snapshot of the DB schema and data as currently deployed) where integration tests are performed. The main branch is treated similarly, except that it is also deployed to a "permanent" environment for QA, etc. This is done automatically using Jenkins scripts.

Merges to the main branch are performed by the developers in the Bitbucket UI after the PR has the necessary approvals and the build/deploy process above passes.

The problem:

Some PRs require changes both in the code and the DB. The integration tests will fail if the DB schema differs from what the code expects or if crucial data is missing.

The approach:

We would like to use change-specific SQL scripts that would apply the required DB changes in the temporary build environments onto the DB snapshot before the integration tests are run.

A convenient approach would be to have a stub db_changes.sql file in the repository that the developers can modify in the feature/bugfix branches, which will be picked up by the build/deployment scripts together with the rest of the branch, but exclude it from merging to the main branch. That will ensure that branching off of the main branch will result in an empty db_changes.sql file and no DB changes will be attempted when the code does not require them.

The issue:

The obvious way of using git attributes to specify a merge strategy for that file does not work on Bitbucket.

Brainstorming alternatives:

One idea is to change the default merge option (if it is possible on Bitbucket) to recursive, which would ostensibly allow the use of git attributes for merge strategies. However, it appears that recursive can be noticeably slower than ort.

Another option is to use a separate DB changes script for each branch, with a naming convention that will associate the file name with the branch name, but this seems fragile and prone to typos.

Edit: Constraints and limitations

Unfortunately we are just a small team in a large company, and don't have the luxury of choosing "better" tools, so we are trying to make the best of what we are given to work with: Bitbucket Cloud (migration to Github is planned) and Jenkins.

Please advise.

Upvotes: 1

Views: 95

Answers (1)

Schwern
Schwern

Reputation: 165586

A convenient approach would be to have a stub db_changes.sql file in the repository that the developers can modify in the feature/bugfix branches, which will be picked up by the build/deployment scripts together with the rest of the branch, but exclude it from merging to the main branch. That will ensure that branching off of the main branch will result in an empty db_changes.sql file and no DB changes will be attempted when the code does not require them.

You could probably accomplish this with hooks. You'd merge PRs into a special branch, let's call it special_merge. Your hook would see the merge on special_merge, add a commit to empty the special db_changes.sql file, and merge that into main. The PostRepositoryHook would seem to do it.

A - B - C [main][special_merge]
 \
  X - Y - Z [branch]

# merge the PR to special_merge

A - B --- C [main]
 \         \
  \         M [special_merge]
   \        /
   X - Y - Z [branch]

# hook detects a merge to special_merge.
# checks out the branch head.
# empties db_changes.sql
# commits (that's commit 0)

A - B --- C [main]
 \         \
  \         M [special_merge]
   \        /
   X - Y - Z - 0 [branch]

# hook merges branch into main
# hook resets special_merge to main

A - B --- C ------ N [main][special_merge]
 \                /
  \              /
   \            /
   X - Y - Z - 0 [branch]

There's a bunch of problems with this approach.

  1. It complicates the merge process.
  2. How are changes to the database schema handled? Where is it stored? How is it tested?
  3. What if a change needs to modify production data? Where is it stored? How is it tested?
  4. What you're testing is not what you're pushing to production.

There are two problems.

  1. Your database changes are not under version control and (presumably) are deployed using a different process than your code.
  2. You're relying on a fixed database to test diverse code.

If we fix both problems the problem goes away, deployments and testing are more robust, and the whole deployment pipeline can be automated.

What you're describing is on the way to schema migrations. Rather than having a fixed schema you temporarily modify, every change to the database is done this way. The schema is generated by running each change on an empty database. You can then dump a snapshot of your schema.

For example, your first schema migration would create the database as it currently exists. In development and testing, this is used to generate a schema and create the database (and normally would be used to create the initial production database). This database has no data, except a table for tracking which migrations have already been applied.

Tests always start with no data. Developers writing tests use tools such as factory_bot to generate test data and populate the database. The database is emptied between tests, at as fine a grain as possible, usually by rolling back a transaction. This avoids a failing test from messing up the database, each test starts from a known clean state.

If a developer wants to make a change to the schema, they write a new schema migration. For example, a branch might include a schema migration with an alter table to add a new column. As before, all migrations are run in order to create the database for dev and testing.

Deploying this change in production means running the unapplied schema migrations in order first on a staging database which does have sample data similar to production (this could be a production snapshot, but be careful of exposing PII to your devs and testers). Once its known to work on staging, the new migrations are then applied to production.

In this way dev, test, and production schemas can be tested and kept in sync automatically. Tests won't accidentally leave data to confuse other tests. Tests can insert and delete any data they want. Branches can make any changes to the database as necessary. Sensitive production data doesn't accidentally leak into dev and testing.

  • dev - developers creates fresh, empty databases to develop and work on using test factories to populate them with data
  • testing - same as dev
  • staging - staging has an existing database with production-like data and only applies new schema migrations, this tests production deployment
  • production - same as staging

Testing confirms the code and schema will work before merging. Staging confirms the schema changes will work with a live database before deploying.

Many ORMs and frameworks support this workflow out of the box, and there are stand-alone tools such as Sqitch.

Upvotes: 1

Related Questions