Preserving the Web: How Drupal’s Wayback Filter Uses the Internet Archive to Mend Broken Links

A Danish-Built Drupal Module Keeps Broken Links Alive by Connecting Them to the Internet Archive’s Wayback Machine.
Preserving the Web: How Drupal’s Wayback Filter Uses the Internet Archive to Mend Broken Links

The web forgets faster than anyone expects. Every day, links once woven into the fabric of online reporting disappear, pages expire, domains change hands, and archives thin out. The quiet process known as link rot erases citations, breaks trust, and leaves once-solid stories riddled with dead ends. For journalists and developers alike, this decay represents not just technical failure but the loss of collective memory.

Within Drupal's open-source community, a small Danish firm, Vertikal.dk, has spent more than a decade quietly resisting that erosion. Its answer is Wayback Filter, a Drupal module that connects every external link in an article to its preserved version on the Internet Archive's Wayback Machine. When a page vanishes, the archived snapshot remains-keeping the record intact.

 

martin-joergensen Profile Image

Martin Joergensen

Martin Joergensen

Developed and maintained by Steven Snedker, with contributions from Martin Jøergensen and Martin Larsen, all members of Vertikal.dk, Wayback Filter is both an act of preservation and a piece of Drupal craftsmanship.

In conversation with The Drop Times, Steven traced its origins, technical decisions, and the persistence required to keep it alive across a decade of change.


Building for Permanence

Martin Larsen

Martin Larsen

Martin Larsen

The idea for Wayback Filter surfaced around 2012, when Vertikal.dk was designing discussion and news platforms for several media houses using Drupal. Those sites were fast, well-optimised, and easy to manage-until time began to break them.

Many of the outbound links embedded in older stories started leading to error pages or, worse, to spam sites that had taken over dormant domains. For developers who cared about the long tail of journalism, the experience was alarming.

Steven Snedker

Steven Snedker

Steven Snedker

"Workflows were smooth, SEO was great," Steven recalled. "But links in old articles were often broken. People were being led to 404 Not Founds, or - worse - to spammers hawking stuff on URLs that used to belong to legit companies or news outlets."

 

The accepted industry practice then was to deploy automated bots that located and removed broken links. To Steven, that approach sacrificed information. He saw value in restoring what once was, not deleting it. 

Conventional wisdom had it that you employed a link checking bot and automatically removed broken links. So if you had a broken link to Huh?Corp you should remove it in order not to disappoint your users.

But I thought it much better to link to the archived version of Huh?Corp at Archive.org. Not only would our readers get what we wanted them to get. They would get exactly the version we saw, when we wrote the article/page!

Destination sites that are missing, broken, changed, missing or taken over by spammers are all mended by the Wayback Filter.

That insight guided the first release of Wayback Filter in 2014: a simple Drupal text filter that would quietly add an extra link to each external URL, pointing readers toward the Wayback Machine's archived copy.

WayBack Filter Module
WayBack Filter Module

How the Module Works

Technically, Wayback Filter operates only when a page is rendered. It scans the article's text, detects outbound URLs, and attaches an archive.org link beside each one. The database content remains untouched, ensuring security and data integrity.

"The contents of the node or article is completely untouched by the Wayback Filter," Steven explained. "At render time, the Archive.org links are appended. So neither the size nor the content of the body field is changed in any way. It's the secure and correct way to do such things."

Because the filter runs at display time, it places almost no burden on performance. Drupal's caching handles the rest. 

"That is the absolute beauty of a Drupal text filter," he said. "It's pretty efficient-no loading or saving-and the internal page cache will probably take care of any performance issues."


Beyond Missing Links

Wayback Filter's purpose extends beyond fixing 404s. In Steven's experience, the larger problem lies in links that still load but point to something entirely different. He has seen reputable interviews replaced by storefronts.

"Wayback Filter does not spend any time checking if a link leads to a 404 - Page not found," he said. "More often than not changed content is a much bigger problem than missing content. That great Tim Berners-Lee interview you linked to seven years ago? The media site died three years ago and was bought by someone selling suitcases online."

The module allows site administrators to configure when archive.org links begin to appear. In earlier years, adding them to links older than five years was reasonable, though Steven now finds a three-year threshold more realistic. He also points to external resources that discuss the issue in depth. 

“Wikipedia has an interesting page on Link rot, and Perplexity has a pretty good answer as to when you should start adding archive.org links to your site,” he said.


Anchoring to the Past

Wayback Filter determines which snapshot to display by using the node's creation moment, the article first went live-as the anchor point.

"Few articles get more links as they age," Steven said. "So it seemed straightforward to have the Archive.org links match the creation date of the article."

Using "last updated" dates proved unreliable; any site-wide edit could distort the result. Earlier research tried to improve on this logic but faltered. A Ph.D. project named Fable attempted a dynamic, AI-assisted solution and achieved only a 23 percent success rate before funding ran out.

Wayback Filter, by contrast, achieves near-total coverage. 

"Unless a truckload of money arrives," Steven noted dryly, "I expect everybody to be content with approximately 95 percent of all broken links being unbroken by well-made links to Archive.org."


The Subtleties of Order

One discovery came by accident: the position of the filter in Drupal's processing order matters. If Wayback Filter runs before a URL-shortening module, it may archive the short link rather than the destination.

"Having the Wayback Filter show up last made for the right URL being used," Steven explained. "Some Drupal text filters shorten URLs. When employing text filters, the sequence is sometimes important as https://tinyurl.com/2ab is even less archived than whatever was at the other end of that link."

Placed last, the filter preserves complete, authentic URLs.


Expanding the Scope, Cautiously

While the module's main function focuses on inline links, it also includes experimental support for link fields. Only one user has ever requested it. Steven built a quick prototype to oblige him.

"On a global scale one user asked for Wayback Filter to also support fields," he said. "I made a quick implementation. He was happy."

If the feature ever becomes popular, he might refine the interface, but his development philosophy remains minimalist:

"When it comes to Wayback Filter, I'm not in the 'build it and they'll come' camp. I'll happily address actual needs, though."

That same simplicity drives his design choices elsewhere. Default settings cover 99 percent of cases: the module never touches internal links, never adds archives to Archive.org pages, and keeps configuration light.

"We don't want to scare the newbies with too many settings," he said.


SEO Logic and Performance

Search-engine integrity, another concern, turns out to be unaffected. Steven calls Archive.org a "high-status source."

"Linking to it should not affect the site negatively. Not even linking excessively to it," he said. "So I've decided against bunging a 'nofollow' on the Wayback links."

He differentiates between internal and external breakage: 

"There's a penalty for having lots of broken internal links. But, as far as I can tell, no penalty for broken external links. Seems fair. If you link to a 404 page at nyt.com, you're still linking to nyt.com. They just made an error."

From a performance standpoint, render-time processing and Drupal caching make Wayback Filter effectively invisible in load tests.

"All my performance considerations were resolved within a minute," he said.


Archive.org and Its Limits

Wayback Filter doesn't ping the Archive.org API in real time. Instead, the Internet Archive's internal logic serves the nearest available snapshot if the requested timestamp doesn't exist.

"That's the beauty of Archive.org," Steven said. "If the Wayback Filter links to a specific date and time (and it does), Archive.org will serve up the closest match."

An earlier companion module, Wayback Submit to Archive.org, once tried to automate the archiving process itself, but no one used it. 

"We killed off that module, as it had absolutely no users," he admitted. 

Today, Archive-It by the Internet Archive handles that function more reliably.

Steven has considered reviving a lighter version that would prompt the Archive to capture every outbound link upon publication, but he values privacy and simplicity too much to complicate the design.


Reliability, Law, and Realism

In daily use, Archive.org handles most requests gracefully. 

"On average, a fine, archived page renders within 10 to 20 seconds," Steven said. "Not blindingly fast. But it does work very well 95 percent of the time."

As for local caching, he avoids it entirely. Hosting copies would invite copyright disputes. 

"Owners of Drupal sites do not have the kind of money for lawyers needed to create and defend local copies," he said. "Let Archive.org fight the lawyers and takedown notices. They have 25 years of experience and a bit more money and talent assigned to the task."


A Decade Across Drupal Versions

Maintaining the module from Drupal 7 through 11 has tested both patience and persistence. The rebuild for Drupal 8 and 9 arrived in 2022 with new JavaScript preview tools.

"I found Wayback Filter so useful, I wanted it to survive the many version upgrades and at some point find an appreciative audience," Steven said.

Learning the new frameworks was difficult. 

"As a lazy expert on Drupal 4-7 I hated having to learn new tricks to master Drupal 8+," he said. "I disliked having to write $user = \Drupal::entityTypeManager()->getStorage('user')->load(42); when $user = user_load(42); was so much easier to remember and write. But with a better IDE and AI, I welcome most of the changes."

That mix of irritation and admiration colors his view of the Drupal community. 

"We're developing faster and maybe better these days," he said. "But there are days where we hit snags and curse Drupal - and our fellow contributors - to hell and back. Solving a problem with AI will be a courteous conversation with a very smart partner, usually leading to a good solution. Solving a problem with fellow Drupal contributors: hit and miss."

Yet he remains loyal to open source, even as he worries that some developers might retreat into AI-driven isolation. 

"I do fear that more and more Drupal developers will retreat from the messy open-source conversations and just develop their own quick solutions to common problems using AI," he said.


Vertikal.dk: Small by Design

Vertikal.dk has always remained compact-just three people-and intends to stay that way. 

"Staying small and competent has made for workdays comparatively less stressful and enjoyable," Steven said. "It's so great to know that all your partners are very competent and friendly."

But the business landscape around them has changed. Expensive Drupal upgrades-7 to 8 to 9 to 10 to 11-have startled clients. And the kind of people commissioning websites now often think differently.

"Ten or twenty years ago, the person in charge of a big media or company website was a curious, nerdy person interested in SEO, metatags, workflows, structure, integrations," Steven said. "These days, it's often a less curious person with some experience in page-building landing pages on WordPress and attracting users to it by buying ads on Facebook."

When such clients doubt Drupal's value, he cites his colleague Dominic's line: 'Try doing this with a CMS that doesn't have Drupal's granular control.' The response, he said, is usually indifference. Still, he believes Drupal's complexity is its strength: 

"Suddenly, Drupal wasn't old - it was essential."

Meanwhile, AI-driven search summaries have siphoned traffic from clients' sites. 

"AI summaries on search engines are starving our clients of users, income and ambitions," he said. "Nevertheless, we soldier on … talking about the greatness of structured data, content types, views, taxonomies, and integrated AI. And links that are useful forever."


Community Response and Indifference

For all its utility, Wayback Filter never achieved mainstream use. 

"When Wayback Filter debuted on Drupal.org in 2014, I thought it'd get used on thousands of sites within a year," Steven said. "Yet absolutely no one cared to install it. Not even the Internet Archive."

Despite sending thousands of inbound links their way, Archive.org never acknowledged the module. 

"The Internet Archive chose, repeatedly, not to tell the world that a module that mended broken links existed," he said.

A WordPress plugin based on the same concept-Add Smart Wayback Machine links to URLs-fared little better. 

"For Archive.org, none of it is worth mentioning," he said.


A Technical Challenge, a Philosophical One Too

In his closing message to The Drop Times, Steven turned the issue back toward publishers. 

"Dear The Drop Times," he wrote. "A growing percentage of the links on your oldest articles are broken. Would you like to mend them? It's absolutely free and comes with no performance degradation nor risk?"

When The Drop Times asked how the module could handle third-party pages that were never archived, Steven’s response was unequivocal. If a page disappeared before Archive.org captured it, he explained, it cannot be recovered—those links are simply lost to time.

Wayback Filter, he clarified, 

"does not archive anything, at any point. It only links to archived versions of pages. These archived pages exist and work well 90 per cent of the time. Way better than 0 per cent."

He added that anyone can ensure permanence by manually saving URLs through the Save Page Now form on Archive.org. 

"Voila! It'll live forever," he said. "And you can be pretty sure that Drupal Wayback Filter will be able to link to it."


Holding the Record Together

Steven has no plans to port Wayback Filter beyond Drupal. Supporting a platform he considers "well designed" remains his priority. But he stays open to integration with other preservation services if the demand arises.

For him and his colleagues at Vertikal.dk, the module represents something larger than a technical fix-it's an ethic of stewardship. In a digital culture built on speed, Wayback Filter stands for patience, accuracy, and the belief that what was once written should remain readable.

Even as much of the web continues to vanish behind broken links, its small, persistent module keeps a fraction of it intact-quietly helping Drupal, and the web itself, remember.

Image Attribution Disclaimer: At The Drop Times (TDT), we are committed to properly crediting photographers whose images appear in our content. Many of the images we use come from event organizers, interviewees, or publicly shared galleries under CC BY-SA licenses. However, some images may come from personal collections where metadata is lost, making proper attribution challenging.

Our purpose in using these images is to highlight Drupal, its events, and its contributors—not for commercial gain. If you recognize an image on our platform that is uncredited or incorrectly attributed, we encourage you to reach out to us at #thedroptimes channel on Drupal Slack.

We value the work of visual storytellers and appreciate your help in ensuring fair attribution. Thank you for supporting open-source collaboration!

Disclosure: This content is produced with the assistance of AI.

Note: The vision of this web portal is to help promote news and stories around the Drupal community and promote and celebrate the people and organizations in the community. We strive to create and distribute our content based on these content policy. If you see any omission/variation on this please reach out to us at #thedroptimes channel on Drupal Slack and we will try to address the issue as best we can.

Related Organizations

Upcoming Events

Latest Opportunities