Dev Solutions: Highly Scalable Image Hosting

While working with a client we ran into a situation where we needed to host thousands of high quality images. The images ranged in size from thumbnail to high definition requiring a total storage size of around 200GB. It was also necessary that this storage be expandable as the amount of images would increase over time.

Due to the nature of the project, the images had to be quickly accessible from anywhere in the world in a variety of dimensions and formats. (Ideally for as cheaply as possible.)

Background

  • Storage: Thousands of high quality images totaling >200GB and increasing
  • Image sizes: thumbnails, full size 
  • Image formats: jpg, png, webp depending on the image and browser support
  • Speed: rotation of images hosted on various platforms including the website, but quick accessibility rules out cheaper slower-to-access storage options
  • Served globally: the team is distributed, resources are added and accessed globally. Used by more than just a website: exhibitions, planning, etc.
  • Existing process: the team already had a streamlined process for adding and editing the database of images via Airtable. It’s important we support the existing process in the solution.

Solution

After evaluating a number of options for image hosting services and storage solutions we landed on a combination of AWS S3 and Imgix. S3 provides reliable storage, and Imgix acts as a CDN on top of that storage to provide additional functionality. Imgix allows us to specify attributes of the image to be displayed (size and format) just by passing those parameters in the URL that is being displayed.

For example, we can start with this image: 

And modify the height, width, and format:

Without having to generate an image with that format / dimensions beforehand.

Implementation

To get the solution in place:

  • An S3 bucket was created
  • Imgix was hooked to the S3 store
  • Users uploaded images to the S3 bucket using an upload tool (https://mountainduck.io/)
  • Image URLs were updated in the Airtable database from Dropbox to Imgix
  • The website had to be updated to implement the Imgix SDK to generate signed URLs wherever images were displayed

Cost

  • S3: the bulk of the cost here is the ongoing storage. Fortunately, S3 is a relatively cheap storage option for this amount of data, but we have to take into account data access metrics as well. With the below numbers we’re still at around $5/month for S3 costs.
S3standard storage and requests for $5 per month: 200 gb storage, 10,000 PUT/COPY?POST?LIST requests, 100,000 GET/SELECT requests, 200 gb data returned, 200 gb data scanned
  • Imgix: If we estimate around 4000 images and average our access metrics we’d expect an aggressive cost estimate to be around $30/month. This would cover the number of master images accessed and CDN bandwidth
  • Total cost: Approximately $35/month which should scale well, grow slowly, and be predictable as the project and database of work grows

Caveats

  • Implementation: This did take a little bit of technical implementation to figure out the solution and get it implemented on their website. The website was custom built in PHP and did not make use of any CMS, but it was pretty quick to install the Imgix libraries for signed URL generation. If you’re not using signed URLs then you probably don’t need to use the libraries and can just reference the imgix URLs directly.

If you’re on WordPress or another CMS it’s worth doing some research into what options are available. Check out the Imgix article on WordPress implementation.

  • Preloading images: After updating the website to use Imgix URLs we noticed that there was a delay (up to 5 seconds) in loading images for the first time, but load times were fast after that. Our solution was  to rig up a script that we’d run once we access all of the Imgix URLs one time, effectively pre-generating/caching the images. A simpler (but slower) solution here would’ve been to click through all of the images on the website to initiate the Imgix generation and caching.

What Didn't Work

Initially the team had hosted all of the images in a paid/premium Dropbox account. This resulted in a lot of mysterious issues where the images would be present in Dropbox but the links wouldn’t work. This was the original reason behind seeking out a new solution in the first place.

TL:DR

The S3 + Imgix is more purpose-built and complex for users to interact with and understand, and it’s also cheaper.