Build Civitas architecture

jericson · January 19, 2024, 8:56pm

Here’s a diagram of the current Build Civitas (and Meta Jon) architecture:

graph LR
    beta.buildcivitas.com-->DNS;
    meta.jlericson.com-->DNS;
    DNS-->nginx;
    nginx-->LE[https://letsencrypt.org];
    subgraph vm [virtual machine]
      style vm fill:#32CD32
      nginx-->errorpages;
      subgraph web_only
        D((Discourse));
      end
      subgraph data
        C[(civitas)];
        J[(jlericson)];
      end
    mail-receiver-->D;
    end
    nginx-->D;
    D-->C ;
    D-->J;   
    C-->S3[S3 backup];
    J-->S3;

This is based on Discourse’s multisite facility.^[1]

Test environment

Thanks to putting my backups on S3 I can use virtually the same setup for a test/staging environment. The difference between staging and production is simply these changes to web_only.yml:

45c45
<   DISCOURSE_HOSTNAME: beta.buildcivitas.com
---
>   DISCOURSE_HOSTNAME: test.buildcivitas.com
83a84,90
>   ## Staging server specific settings## Staging server specific settings
>   DISCOURSE_AUTOMATIC_BACKUPS_ENABLED: false
>   DISCOURSE_LOGIN_REQUIRED: true
>   DISCOURSE_DISABLE_EMAILS: 'non-staff'
>   DISCOURSE_S3_DISABLE_CLEANUP: true
>   DISCOURSE_ALLOW_RESTORE: true
> 
135c143
<              - meta.jlericson.com
---
>              - test.jlericson.com

It can also help to update the version parameter to match production rather than taking the latest tests-passed version of Discourse.

Upgrading Discourse

Having a separate data container for PostgreSQL and Redis means we can save time on upgrades that don’t require updates to the database. I did some timing on a Droplet with 2 vCPUs 4GB / 50GB Disk which currently costs $24 a month:

Process	Time
rebuild data	58s
rebuild web_only	14m 26s
Total	15m 24s

As you can see, the rebuilding the web_only container takes the bulk of the time. So skipping the data container only shaves a minute off the 15 minute proces. The real gain comes from bootstrapping a new web_only while the site is still running. The launcher rebuild command does three things:

bootstrap a new container, but not start it.
destroy the old container and
start the new container.

During the time it takes to destroy and start the container, the site is down. Fortunately this can take less than half a minute. The bootstrap takes the bulk of the time:

Process	Time
bootstrap web_only	13m 35s
destroy web_only	14s
start web_only	10s
Total	13m 59s

This only works if bootstrapping can happen while the site is running. The key limitation is memory. The cheapest Droplet that can run Discourse has 2GB of RAM. That’s not enough to run the site and bootstrap a new container. 4GB work fine. Fortunately, DigitalOcean allows us to resize the Droplet quickly and easily. The total downtime is about a two minutes:

Process	Time
power off Droplet	21s
resize Droplet	44s
power on Droplet	10s
destroy web_only	14s
start web_only	10s
Total	1m 39

At some point we’ll run on a virtual machine with 4GB RAM, but since it doubles the cost, it’s not worth it until the site has more people depending on it. As it is, nobody but me will even notice the downtime.

Which is based, in turn, on the Ruby on Rails multisite feature. ↩︎

jericson · February 1, 2024, 10:23pm

I just did an update to Discourse v3.3.0.beta10-dev and UptimeRobot didn’t notice:

I haven’t moved back to the 2Gb VM yet, but that should only be a minute or so.