5 misapprehensions about importing old forum data into Discourse

While helping communities import legacy forums, I’ve heard a few worries about the process. It’s understandable! Making changes in a community can be disruptive and translating data from one format to another isn’t always smooth. Still, some of those concerns are based on faulty understandings.

1. Old content hurts SEO

This concern comes from two true statements:

  1. Google limits the time it spends crawling sites and
  2. Google prefers fresh content.

Put those factors together and the question arises: “Why use your limited crawl budget on stale content?”

What this analysis misses is Google’s desire to crawl your content and Discourse’s sophisticated SEO systems. In particular, Discourse automatically generates sitemaps that include the lastmod tag which allows Google and other search engines to know if the page has changed since the last time it was indexed. It also generates a special sitemap for recently updated topics[1] that helps search engines find the freshest content.

But all of this is a temporary problem. Once Google sees a forum has more content, it can increase the crawl budget.[2] After a search engine indexes a page, it can start showing it in search results. That means old content helps with SEO by opening up more keywords for search engines to index.

2. Too many posts slow down the site

It makes sense that if there’s more content on a site it takes longer for the software to find and deliver it to visitors. Since Google also takes site performance into account when deciding to show search results, a slower site harms SEO. That’s minor compared to the damage a slow site does to the user experience. So we don’t want to add old content if it’s going to significantly harm the site’s responsiveness.

Thankfully Discourse doesn’t necessarily get slower when more content is added.[3] That’s because Discourse stores data in PostgreSQL where it’s efficiently indexed for retrieval. Out of the box, Discourse also uses Redis to cache the most common queries so that it can build pages quickly.

It isn’t more posts that slow down Discourse, but rather more concurrent users. In a roundabout way, importing old forum data might slow a site down if it brings back inactive users. That’s a win for most communities.

3. Paying to store topics nobody reads is wasteful

When importing old topics, it’s inevitable there will be some that nobody ever reads again. The Pareto principle suggests the vast majority of views will come from a tiny portion of the content. Even if that content doesn’t slow down the site, it might not be worth the cost of using extra storage. Why pay to keep posts that will never produce value?

It might make sense to cull some content that’s received few interactions and views, but not because of the storage space. Forums are mostly text which doesn’t consume much in terms of disk space. A better reason to remove content with very few views is those are likely to be the lowest quality posts. But that’s a good policy for newer content too.

The reality, of course, is that you are probably paying to save that data anyway. Maybe it’s not costing money on an expensive server, but somebody is keeping track of a backup even if it’s just sitting on a developer’s laptop. Importing that data into Discourse means you don’t need to worry about it anymore.

4. New users don’t care about ancient history

Have you ever joined a new group and noticed existing members laughing about some nonsensical thing someone said? You can be sure it’s an inside joke that came from some incident long ago. Now you are in the awkward situation of either laughing along insincerely or asking for the backstory. This uncomfortable occurrence informs the instinct that group history can be a barrier to entry.

But hiding that history is exactly the wrong solution to the problem. If you don’t care about joining a group, inside jokes make no difference. It’s only because we want to be a part of the joke that they can be off putting. While there is such a thing as toxic communities, the complaint usually means new users can’t see a path for themselves to join in. When people report the community isn’t welcoming it’s often a sign of a community worth joining.

Online forums offer a unique solution to the inside joke/insular culture problem. Instead of being left out, you can dig into the source material and relive the history you missed. Oftentimes communities create FAQs to help new user understand the culture. Going from clueless newb to an in-the-know member becomes much easier if ancient history is easily discoverable. Holding back older posts makes the problem much worse.

5. It costs too much to migrate from other forum systems

Of course cost is relative. A few hundred dollars might be a rounding error or entire budget depending on the nature of the community. If you hire someone to do it for you, migration can cost between $500 and $3600 depending on the size of your forum. In the long run, I believe these prices are a wonderful investment.

If you want to investigate more, contact me to learn how I can help with the full process of importing an old forum including community and technical considerations. It’s never too late to preserve the history of a community.


  1. Example: https://talk.collegeconfidential.com/sitemap_recent.xml ↩︎

  2. Honestly crawl budget is mostly a concern for sites that try gaming the system by generating tons of URLs with duplicate content. ↩︎

  3. In the rare cases when the site does slow down, it’s easy to solve the problem by increasing the computational resources running the site. ↩︎