Maintenance Tasks¶
Describing an environment¶
While performing maintenance on an environment, it’s sometimes helpful to know
exactly what servers are in that environment and what their load balancer
status is, if any. To get a list of all servers in a given environment and
print some basic meta data about those servers, you can use the describe
command to Fabric, like so:
fab describe:myproject,<environment>
Adding new sysadmin users¶
If you don’t have access to the servers yet, add your SSH public key in the deployment/users/ directory. To avoid having to pass a -u argument to fabric on every deploy, make the name of the file identical to your local username. Then ask someone who has access to run this command:
fab staging update_sysadmin_users
Updating New Relic keys¶
To update the New Relic API and License keys, first find the new keys from the new account. The License Key can be found from the main account page, and the API key can be found via these instructions: https://docs.newrelic.com/docs/apis/api-key
Next, make sure your local fabsecrets_<environment>.py file is up to date:
fab production update_local_fabsecrets
Next, update the newrelic_license_key
and newrelic_api_key
values
inside the fabsecrets_<environment>.py
file with the new values. Then, update the keys
on the servers:
fab staging update_server_passwords
fab production update_server_passwords
Finally, update the configuration files containing the New Relic keys and restart the Celery and Gunicorn processes:
fab update_newrelic_keys:myproject,staging
fab update_newrelic_keys:myproject,production
Note this short method of updating the configuration files involves a brief moment of downtime (10-20 seconds). If no downtime is desired, you can achieve the same result by repeating the following commands for each environment, as needed (but it will take much longer, i.e., 30-60 minutes):
fab production upload_newrelic_sysmon_conf
fab production upload_newrelic_conf
fab deploy_serial:myproject,production
Copying the database from production to staging or testing¶
To copy the production database on the staging server, run the following command:
fab staging reload_production_db
This will drop the current staging DB, create a new database, load it with a copy of the current production data, and then run any migrations not yet run on that database. The same command will work on the testing environment by replacing “staging” with “testing”. Internally, autoscaling is suspended and an upgrade message is displayed on the servers while this command is in progress.
Fixing an issue with broken site icons¶
If the button icons on the site appear as text rather than as images, there is probably an issue with the CORS configuration for the underlying S3 bucket that serves the font used to show these icons. To correct this, follow these steps:
First, navigate to the S3 bucket in the AWS Console, and click the Properties tab
Next, expand the Permissions section and then click Add CORS Configuration. The text in the popup should look something like this:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>Authorization</AllowedHeader>
</CORSRule>
</CORSConfiguration>
Finally, click the Save button to add the configuration. This step is important; while it may appear that the configuration is already correct, it needs to be saved before it will be added by S3.
Stopping EC2 machines while not in use¶
Some types of instances, included db-master, db-slave, and worker servers, can be stopped via the AWS console, later restarted, and then reconfigured by running the following commands (in order):
fab <environment> mount_encrypted:roles=db-master
fab <environment> mount_encrypted:roles=db-slave
fab <environment> mount_encrypted:roles=worker
The cache server, due to an intricacy with how RabbitMQ stores its data and configuration files, must be completely terminated and recreated (it does not support changing the host’s IP address). For more information, see: http://serverfault.com/questions/337982/how-do-i-restart-rabbitmq-after-switching-machines
Web servers are managed via Amazon Auto Scaling. To terminate all web servers, simply navigate to the AWS Auto Scaling Group and set the Minimum, Desired, and Maximum number of instances to zero. Failure to complete this step may result in the Auto Scaling Group perpetually attempting to bring up new web servers and failing because no database servers exist.
Resizing servers or recreating an environment¶
An entire environment can be recreated, optionally with different server sizes, with a single command. Note that this command takes a long time to run (30-60 minutes or even several hours, depending on the size of the database). For this reason, it is beneficial to clean out the database (see above) before downsizing the servers because copying the database from server to server takes a significant portion of this time. That said, the environment will not be down or inaccessible for this entire time; rather, the script does everything in an order that minimizes the downtime required. For a typical set of smaller servers and an empty database, the downtime will usually be less than 2 minutes.
If you’d like to resize an environment, first edit the instance_types
dictionary in fabulaws-config.yml
to the sizes you’d like for the servers.
Here are the minimum sizes for each server type:
- cache:
m1.small
- db-master:
m1.small
- db-slave:
m1.small
- web:
m1.small
- worker:
m1.medium
Once the sizes have (optionally) been adjusted, you can recreate the environment like so:
fab recreate_servers:myproject,production
Updating Dependencies¶
To circumvent the inevitable issues with PyPI during deployment, sdists for all
dependencies needed in the staging and production environments must be added to
the requirements/sdists/
directory. This means that, whenever you change in
requirements/apps.txt
, you should make a corresponding change to the
requirements/sdists/
directory.
Adding or updating a single package¶
To download a single sdist for a new or updated package, run the following
command, where package-name==0.0.0
is a copy of the line that you added to
requirements/apps.txt
:
pip install package-name==0.0.0 -d requirements/sdists/
After downloading the new package, remove the outdated version from version control, and add the new one along with the change to apps.txt.
Repopulating the entire sdists/ directory¶
You can also repopulate the entire sdists directory as follows:
cd requirements/
mkdir sdists_new/
pip install -r apps.txt -d sdists_new/
rm -rf sdists/
mv sdists_new/ sdists/
Upgrading system packages¶
Since the site uses Amazon Auto Scaling, to ensure the servers have the latest versions of Ubuntu packages we first need to update the web server image. This can be done by running a new deployment, like so:
fab deploy_serial:myproject,<environment>
Upgrading Ubuntu packages on the persistent (non-web) servers can be done with
the upgrade_packages
Fabric command. Before upgrading, it’s best to take
the site offline and put it in upgrade mode to avoid any unexpected error pages
while services are restarted:
fab <environment> begin_upgrade
Once the site is in upgrade mode, you can update packages on the servers as follows:
fab <environment> upgrade_packages
This command will connect to the servers one by one, run apt-get update
,
install any new packages needed by the web servers, and then run
apt-get upgrade
. You will be prompted to accept any upgrades that need to
take place, so you will have the opportunity to cancel the upgrade if needed
for any reason.
After verifying that the packages have installed successfully, you can bring the site back online like so:
fab <environment> end_upgrade
Note that upgrading may take some time, depending on the number of servers and size of the upgrades, so it’s best to schedule this during an off-hours maintenance window.