Why running \Drupal::entityDefinitionUpdateManager()->applyUpdates() in update hooks is bad

There's one problematic thing that I've already seen a couple of times since Drupal 8 exists, which can cause great problems and unpredictable side effects which the developer didn't expected at all:

A module is changing/extending their entity schema and is needing to write an update hook. The automatism for updating the entity schema was removed in 2015 due to its unpredictable behaviour for module updates (you couldn't write an update hook to migrate content, if the schema update ran before, for example).

That's a thing, we developers still have to get used to it. I can remember that I've also needed a while that I have to run entity updates on my custom modules after I've changed the entity schema, e.g. changing/adding a base field. Afaik there were also some contrib modules in a pre-mature state missing that.

The next thing is that when you realize that you have to write update hooks for this, you'll see that this can be quite cumbersome to accomplish, with a Drupal unusual quite high amount of lines of code... except... except... except you run this simple one-liner called:

\Drupal::entityDefinitionUpdateManager()->applyUpdates();

Calling this seems to solve all your problems. It runs all pending entity definitions! That's great! But wait... it really runs ALL pending entity definition updates, even those from other modules. And that's bad, very bad! Because this can have really unwanted side-effects on other modules. In some cases it may help because the other module has carelessly forgot to run the updates, and you're doing it for them. But nevertheless you should never do this on behalf of another module. But it could also lead to data loss because the other module has prepared a proper update hook to migrate data from a column that will be removed to another new column. But your update hook may run first and destroy their data.

In some cases you can even destroy your own update process. Last night, the great Webform module published its beta10, which ships with a huge number of new features and therefore running quite a lot of updates. One of them is adding a new database table. But unfortunately it will break if you update from beta9 (or older) and not from a dev version you've installed at the right moment. Because at the moment of running the update hook 8032 that tries to create the table, it already exists. Why? Because another Webform update 8026 that ran seconds before called the applyUpdates() function of the entity definition update manager. And this call already created the table! You can read more about this here: https://www.drupal.org/node/2866957

Core API is the problem here

The problem here is not that the Webform developer(s) or any other that has done this is a bad developer, not caring about how to write clean code - I really appreciate Webform as a quality module. The problem here is that Drupal does not provide an easy to use API for us developers in this case, which is really very unusal for Drupal. You'll need an unusual amount of lines of code to write such updates. You'll feel uncomfortable while writing this. And as you do and look for tutorials and examples or search the API docs, you'll often find (it yourself or by recommendations) to call \Drupal::entityDefinitionUpdateManager()->applyUpdates(). And if you're focused on the update you need at this moment, you'll probably not realize that impacts that this can have in the wild.

Even if you're aware that this will run not only your changes but also affect other modules or even old or upcoming updates of your own module, you'll be wondering. You expect that in the same class (EntityDefinitionUpdateManager) there's another function that can call a specific single update - but nope, you won't find one :( You won't even find a public accessible function to get a list of pending changes for a given entity type. Furthermore, many of the called functions are protected. I've once wrote a custom update helper service that should help me in this project for running pending field storage updates. At the end, this class had nearly 300 lines of code (including comments), where most of this was just copied and pasted from the entity definition update manager, only some small needed changes to function signatures.

Also, in other areas of writing update scripts, some improved API would help a lot. Eg Drupal Commerce has developed their own ConfigUpdater service to be able to update its delivered configuration entities.

I hope, that we'll get a better API support in future for this in core. I'll try to reach out some Core contributors on IRC after writing this post, as my concerns are summarized here quite well ;) In the meantime, the best documentation on how to correctly write entity schema updates is this change record here.

EDIT: Important update

I'm currently writing with catch on IRC. He mentioned the existence of EntityDefinitionUpdateManager::updateEntityType, which can be used to perform all changes of the given entity type only. I may have missed that because I once needed/wanted that on field level, by only calling updateFieldStorage($entity_type_id, $field_name). This updateEntityType() however looks great and is for sure enough in 99% of all use cases.

I still would find a public getChangeList() useful (now protected) for machine-based processing because the public getChangeSummary() is a human-readable version of this. And we agreed that people are clearly missing the warning that is documented in applyUpdates() stating "Use this with care, as it will apply updates for any module, which will lead to unpredictable results."

Hopefully some of you will read this blog post and pay attention to this potential problem, when writing your update hooks :)

PS: here's a good example from catch for how to and how not to update a single field in an update hook. I'll reference a commit log where Core bug has been fixed because comment update was not correctly updating the field storage definition.

EDIT: Update 2017-04-06

I should have added this note on yesterday's update: after looking at the example from the last paragraph, thinking about that and chatting with catch, I've realized that there's good reason, why there's no simple function for simply "update this field of this entity type" because if you do this, you're always updating the entity type and its field to its schema structure at the time of running the update, not necesarily the one at time of writing the update hook, which can make a difference in some edge cases! I've found a very good explanation and comparison from amateescu, which describes the potential risk perfectly: "An analogy to D7 would be something like: don't call hook_schema() in an update function to get the schema for creating a table, but hardcode the new table schema instead"

Neuen Kommentar schreiben