The Silent Killer of Drupal Migrations

You've just completed a massive Drupal migration. Tens of thousands of new entities—products, articles, or trailers—are now beautifully displayed on your site. You run a quick check, everything looks perfect. But then, you notice it: the site is slow. Your search pages are lagging. The search index is working overtime, and you can practically feel the server overheating. What went wrong?

The truth is, while both Drupal's Migrate API and the Search API are incredibly powerful tools on their own, they have a hidden conflict. When used together, they create a perfect storm of inefficiency that can bring your entire website to a crawl.

The culprit is the unnecessary re-indexing of content.

The Problem: A Misunderstanding Between Migrate and Search API

When you run a migration, the Migrate API works by processing each row from your source and pushing the data to a destination entity. A crucial step in this process is calling the ->save() method. A key detail here is that the Drupal core is smart: when you save an entity, the changed timestamp is only updated if the entity's data has actually changed. This is an efficient process on its own.

However, the ->save() call also triggers a crucial event in Drupal's ecosystem: hook_entity_update(). The Search API module, by default, listens for this hook and assumes that any entity that triggers it needs to be queued for re-indexing.

This is where the problem lies. The Search API doesn't check if the changed timestamp has been updated; it simply reacts to the hook_entity_update() trigger. For every row that Migrate processes, a save operation is performed, the hook is triggered, and the Search API inefficiently queues the entity for re-indexing, even if nothing has changed. On a large-scale migration with thousands of entities, this leads to a massive amount of wasted processing time and a severely overloaded search index queue.

The Modern Bottleneck: AI and Vector Databases 🤖

In many scenarios, this load might not be immediately noticeable. But in modern web development, particularly with the use of AI modules, this inefficiency becomes a critical bottleneck.

For example, if you're using the AI Search module with a Milvus vector database for RAG (Retrieval-Augmented Generation) queries, every content entity must be chunked and converted into vectors. This is a highly resource-intensive process. The indexation of just a few entities can take a significant amount of time, causing a single Cron job to run for an extended period. This increased server load puts your system at risk of never finishing the indexation process, especially if new imports are running multiple times a day.

In these advanced use cases, every single indexing operation is valuable, and wasting one on an unchanged entity is simply not an option.

The Solution: A Surgical Strike on the Search Index

The fix isn't to disable your search index or manually rebuild it after the migration. Those are blunt instruments that can lead to downtime or a frustrating rebuild period. The elegant solution is to tell Search API to look away when a save operation happens, but only when the entity hasn't actually changed.

We can achieve this by creating a custom Migrate Destination Plugin with a Deriver. This is the Drupal way of solving a complex problem—by creating a flexible, reusable component that can be applied to any content entity migration.

Instead of using the standard entity plugin, our custom entity_conditional_search_api plugin will do the following:

  1. It will process the incoming data and map it to a destination entity, just like a normal migration.
  2. It will check if the entity has any actual data changes using the hasTranslationChanges() method. This is the key.
  3. If the entity's data has not changed, it will set a special property on the entity: $entity->search_api_skip_tracking = TRUE. This is an official "escape hatch" built into the Search API. It's the signal to the Search API's event listeners to simply ignore this save operation and not queue the entity for re-indexing.
  4. It will then call ->save().

By doing this, you prevent unnecessary re-indexing for unchanged content while ensuring that any genuinely updated content is properly indexed. This approach allows you to run large migrations on a live site without causing performance issues.

It's a small change to your migration configuration, but it makes a world of difference.

Implementation

Thanks to the powerful architecture of Drupal, the actual implementation is a breeze. We need two classes, both can sub-class existing Migrate classes, so we only need a few lines of code:

The migrate destination plugin

<?php

namespace Drupal\MY_MODULE\Plugin\migrate\destination;

use Drupal\Core\Entity\ContentEntityInterface;
use Drupal\migrate\Attribute\MigrateDestination;
use Drupal\migrate\Plugin\migrate\destination\EntityContentBase;
use Drupal\MY_MODULE\Plugin\Derivative\MigrateEntityConditionalSearchApi;

/**
 * Provides a destination for migrating the entire entity revision table.
 */
#[MigrateDestination(
  id: 'entity_conditional_search_api',
  deriver: MigrateEntityConditionalSearchApi::class
)]
class EntityConditionalSearchApi extends EntityContentBase {

  /**
   * {@inheritdoc}
   */
  protected function save(ContentEntityInterface $entity, array $old_destination_id_values = []) {
    $entity->setSyncing(TRUE);

    if (!$entity->isNew() && !$entity->hasTranslationChanges()) {
      // Set the flag to tell Search API to skip tracking this entity.
      $entity->search_api_skip_tracking = TRUE;
    }

    $entity->save();
    return [$entity->id()];
  }

}

The deriver

<?php

namespace Drupal\MY_MODULE\Plugin\Derivative;

use Drupal\migrate\Plugin\Derivative\MigrateEntity;
use Drupal\MY_MODULE\Plugin\migrate\destination\EntityConditionalSearchApi;

/**
 * Deriver for entity_conditional_search_api:ENTITY_TYPE entity migrations.
 */
class MigrateEntityConditionalSearchApi extends MigrateEntity {

  /**
   * {@inheritdoc}
   */
  public function getDerivativeDefinitions($base_plugin_definition) {
    foreach ($this->entityDefinitions as $entity_type => $entity_info) {
      $this->derivatives[$entity_type] = [
        'id' => "entity_conditional_search_api:$entity_type",
        'class' => EntityConditionalSearchApi::class,
        'requirements_met' => 1,
        'provider' => $entity_info->getProvider(),
      ];
    }
    return $this->derivatives;
  }

}

Usage

Now you can use it in your migrations. Instead of:

destination:
  plugin: 'entity:commerce_product'

you'll have to write

destination:
  plugin: 'entity_conditional_search_api:commerce_product'

Replace "commerce_product" with the entity type you are importing.

Conclusion

Ultimately, this small change in your migration configuration is a powerful optimization, allowing your site to scale efficiently and your search functionality to perform flawlessly, even with the most demanding AI-driven data pipelines!

Comments

Klartext

  • Keine HTML-Tags erlaubt.
  • Zeilenumbrüche und Absätze werden automatisch erzeugt.
  • Website- und E-Mail-Adressen werden automatisch in Links umgewandelt.
The content of this field is kept private and will not be shown publicly.