Drupal Migration Taxonomy Audit, Part 2: How to Analyze Your Data and Build a Migration-Ready Taxonomy Plan
In Part 1 of this series, we established why planning your taxonomy cleanup before a Drupal 7 to Drupal 10 or 11 migration significantly reduces QA and rework. This second part of the series explains how to perform the taxonomy audit, interpret the findings, and express your decisions in the migration layer of the new Drupal system. Although these examples use Drupal 7 for illustration, the analysis process applies to any CMS that uses taxonomy or categorical metadata.
1. Exporting Your Taxonomy Data
The first step is to export a complete set of taxonomy data from Drupal 7. This includes vocabulary names, term IDs, term names, and usage counts. These exports feed the analysis workbook described later in this article.
Export all taxonomy terms with usage count
SELECT
v.name AS vocabulary,
t.tid,
t.name AS term,
COUNT(r.nid) AS usage_count
FROM taxonomy_term_data t
LEFT JOIN taxonomy_index r ON t.tid = r.tid
JOIN taxonomy_vocabulary v ON v.vid = t.vid
GROUP BY t.tid
ORDER BY v.name, t.name;
DDEV and Drush version:
ddev drush sqlq "
SELECT
v.name AS vocabulary,
t.tid,
t.name AS term,
COUNT(r.nid) AS usage_count
FROM taxonomy_term_data t
LEFT JOIN taxonomy_index r ON t.tid = r.tid
JOIN taxonomy_vocabulary v ON v.vid = t.vid
GROUP BY t.tid
ORDER BY v.name, t.name;
" > taxonomy-export.csv
Load this CSV into your taxonomy audit workbook. The next section explains that workbook and how it supports the full analysis workflow.
2. The Taxonomy Audit Workbook
The workbook provides a structured way to evaluate inconsistencies and plan your canonical taxonomy. It typically includes the following worksheets:
Vocabulary Summary
- Vocabulary name
- Number of terms
- Terms with zero usage
- Whether the vocabulary is flat or hierarchical
Normalization Worksheet
- The original term
- The problem identified, such as duplication or inconsistent naming
- The recommended canonical term
- The planned action, such as merge or retire
Mapping Worksheet
This sheet defines how old terms and vocabularies map to new canonical values. It later becomes a lookup table used in the Drupal 10 or 11 migration process.
Usage Heatmap
This worksheet highlights high-value and low-value terms. High usage indicates a term that should not change significantly. Low or zero usage indicates that a term may no longer be relevant.
3. SQL Toolkit for Taxonomy Analysis
The queries below form a complete taxonomy audit toolkit. Each query includes a description and guidance on how to interpret the results.
3.1 Vocabulary Overview
SELECT
v.vid,
v.name AS vocabulary,
COUNT(t.tid) AS term_count
FROM taxonomy_vocabulary v
LEFT JOIN taxonomy_term_data t ON t.vid = v.vid
GROUP BY v.vid
ORDER BY v.name;
3.2 Terms by Usage Count
SELECT
t.tid,
t.name AS term,
v.name AS vocabulary,
COUNT(r.nid) AS usage_count
FROM taxonomy_term_data t
LEFT JOIN taxonomy_index r ON r.tid = t.tid
JOIN taxonomy_vocabulary v ON v.vid = t.vid
GROUP BY t.tid
ORDER BY usage_count DESC, term;
3.3 Unused Terms
SELECT
t.tid,
t.name AS term,
v.name AS vocabulary
FROM taxonomy_term_data t
JOIN taxonomy_vocabulary v ON v.vid = t.vid
LEFT JOIN taxonomy_index r ON t.tid = r.tid
WHERE r.tid IS NULL
ORDER BY v.name, term;
3.4 Similar Terms
SELECT
t1.tid AS tid_1,
t1.name AS term_1,
t2.tid AS tid_2,
t2.name AS term_2,
v.name AS vocabulary
FROM taxonomy_term_data t1
JOIN taxonomy_term_data t2
ON t1.vid = t2.vid
AND t1.tid < t2.tid
JOIN taxonomy_vocabulary v ON v.vid = t1.vid
WHERE LEFT(t1.name, LENGTH(t1.name) - LOCATE(' ', REVERSE(t1.name))) =
LEFT(t2.name, LENGTH(t2.name) - LOCATE(' ', REVERSE(t2.name)))
ORDER BY vocabulary, term_1, term_2;
3.5 Parent and Child Hierarchies
SELECT
parent.tid AS parent_tid,
parent.name AS parent_term,
child.tid AS child_tid,
child.name AS child_term,
v.name AS vocabulary
FROM taxonomy_term_hierarchy h
JOIN taxonomy_term_data child ON child.tid = h.tid
LEFT JOIN taxonomy_term_data parent ON parent.tid = h.parent
JOIN taxonomy_vocabulary v ON v.vid = child.vid
ORDER BY vocabulary, parent_term, child_term;
3.6 Orphan Terms
SELECT
h.tid AS child_tid,
child.name AS child_term,
h.parent AS missing_parent_tid,
v.name AS vocabulary
FROM taxonomy_term_hierarchy h
JOIN taxonomy_term_data child ON child.tid = h.tid
JOIN taxonomy_vocabulary v ON v.vid = child.vid
LEFT JOIN taxonomy_term_data parent ON parent.tid = h.parent
WHERE h.parent IS NOT NULL
AND parent.tid IS NULL
ORDER BY vocabulary, child_term;
3.7 Terms Used Only on Unpublished Content
SELECT
t.tid,
t.name AS term,
v.name AS vocabulary,
COUNT(n.nid) AS unpublished_usage
FROM taxonomy_term_data t
JOIN field_data_field_tags f
ON f.field_tags_tid = t.tid
JOIN node n
ON n.nid = f.entity_id
JOIN taxonomy_vocabulary v
ON v.vid = t.vid
WHERE n.status = 0
GROUP BY t.tid, t.name, v.name
HAVING COUNT(n.nid) > 0
ORDER BY unpublished_usage DESC, term;
4. How to Analyze the Results
With your data exported and the SQL queries complete, the next step is to interpret the results and record decisions in your audit workbook.
4.1 Identify Duplicate Concepts
Use the similar terms query to find duplicate or inconsistent concepts. Assign one canonical term and record the mapping in the normalization worksheet.
4.2 Decide Which Terms to Retain or Retire
Use the usage count and unused terms worksheets. Terms with little or no usage may not be needed in the new site. Rarely used terms should be validated with content owners.
4.3 Normalize Naming Conventions
Apply naming rules such as singular nouns, consistent capitalization, and clear terminology. The normalization worksheet provides a controlled way to document this.
4.4 Evaluate Vocabulary Structure
Use the vocabulary overview and hierarchy queries to determine whether vocabularies should remain separate or be consolidated. Document your planned structure in the mapping worksheet.
5. When Cleanup Occurs and Why It Happens in the Migration Layer
This is a key point that often causes confusion. The taxonomy audit identifies what should change, but the changes themselves are not applied in Drupal 7. They are executed in the Drupal 10 or 11 migration layer.
- It would require new Drupal 7 code that is immediately discarded after migration.
- Directly modifying the Drupal 7 database is unsafe and risks corrupting the legacy system.
Instead:
- All cleanup decisions are recorded in the audit workbook.
- The Drupal 10 or 11 migration process applies those decisions while importing terms and nodes.
- Migration mapping enforces canonical names, merges duplicate terms, retires unused terms, and consolidates vocabularies.
6. Creating the Canonical Manifest for Migration
Once your decisions are complete, represent them in a canonical manifest file. This file is used by the migration process to interpret old terms and apply the new taxonomy structure.
vocabularies:
topic:
canonical: "Topic"
merge:
- "Topics"
- "Subject"
retire:
- "Deprecated Topic"
terms:
"policy update":
canonical: "Policy Update"
"policy updates":
canonical: "Policy Update"
"test term 1":
canonical: "Test Term"
"test term 2":
canonical: "Test Term"
7. Final Summary
The taxonomy audit consists of analysis, decision making, and documentation. Cleanup is possible on the Drupal 7 side, but it is not recommended because it requires new code in an obsolete environment and introduces risk to the legacy database. The correct and modern approach is to document all cleanup decisions, and then execute those changes in the migration process itself.
With the workbook complete and the manifest in place, your migration will produce a consistent and maintainable taxonomy structure while sharply reducing the amount of QA needed on the destination system.


