SphagnumShuffle
SphagnumShuffle

Reputation: 41

Cloud Composer, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0"

I'm running the Cloud Composer environment with Composer version 1.18.7 and Airflow version 1.10.15. Based on Google Cloud Platforms documentation the error message "(_mysql_exceptions.OperationalError) (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")" indicates that Airflow database is under heavy load: (https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags#symptoms_of_airflow_database_being_under_heavy_load).

I tried the solutions suggested in the above link (db maintenance dag, upgraded Cloud SQL Instance to bigger one, from default one to db-n1-standard-4, 4 vCPU, 15 GB memory). Unfortunately these had no effect on the issue and I still get lot of these errors on daily basis and they appear quite randomly. I have no idea what to do from now on as I can't find any other solutions anywhere. The Airflow database is not anywhere near full, as running this one from the AdHoc Query from Airflow UI with airflow_db as choice from dropdown:

SELECT table_name AS "Table",
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "Size (MB)"
FROM information_schema.TABLES
WHERE table_schema = "composer-1-18-7-airflow-1-10-15-xxxxx"
ORDER BY (data_length + index_length) DESC;

(redacted some information) gives me this: results of query. As you can see there is no indication that the table is full by any means. As a side note I did the Composer image upgrade recently as the support was ending for previous version. I'm also using Airflow mostly with Python operators, but also some BashOperator tasks fail too with same error message. Here is more detailed information about the error message:

[2022-05-19 10:11:27,820] {taskinstance.py:1152} ERROR - (_mysql_exceptions.OperationalError) (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 364, in connect
    return _ConnectionFairy._checkout(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get
    return self._create_connection()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 493, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/opt/python3.8/lib/python3.8/site-packages/MySQLdb/__init__.py", line 85, in Connect
    return Connection(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/MySQLdb/connections.py", line 208, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 968, in _run_raw_task
    RTIF.write(RTIF(ti=self, render_templates=False))
  File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/models/renderedtifields.py", line 90, in write
    session.merge(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2162, in merge
    return self._merge(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2240, in _merge
    merged = self.query(mapper.class_).get(key[1])
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 1018, in get
    return self._get_impl(ident, loading.load_on_pk_identity)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 1135, in _get_impl
    return db_load_fn(self, primary_key_identity)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 286, in load_on_pk_identity
    return q.one()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3490, in one
    ret = self.one_or_none()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3459, in one_or_none
    ret = list(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3556, in _execute_and_instances
    conn = self._get_bind_args(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3571, in _get_bind_args
    return fn(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3550, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1138, in connection
    return self._connection_for_bind(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1146, in _connection_for_bind
    return self.transaction._connection_for_bind(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 433, in _connection_for_bind
    conn = bind._contextual_connect()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2302, in _contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1583, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 364, in connect
    return _ConnectionFairy._checkout(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get
    return self._create_connection()
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 493, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/opt/python3.8/lib/python3.8/site-packages/MySQLdb/__init__.py", line 85, in Connect
    return Connection(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/MySQLdb/connections.py", line 208, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
(Background on this error at: http://sqlalche.me/e/13/e3q8)

I would really appreciate any ideas or solutions how to fix this pesky error!

Upvotes: 1

Views: 1056

Answers (1)

SphagnumShuffle
SphagnumShuffle

Reputation: 41

I got this sorted out. I found out that the Airflow scheduler was for some reason putting up a lot of pressure/load on Cloud SQL instance for some unknown reason. I restarted the scheduler and it has been smooth ride since then and no MySQL errors.

I have no idea what caused the scheduler to put unnecessary load on Cloud SQL instance, but this appeared same time as I updated the Composer image.

So if you have similar issues with Google Cloud Composer, you can start by restarting the scheduler. I spent way too much time debugging the issue with so easy solution and I hope nobody else wastes so much time on such clear solution

Upvotes: 3

Related Questions