Communikitchen

Big ideas about the web.

Building a Flickr replacement – Part 1

Screenshot of the album's front page

About three years ago I had the idea to build a replacement for Flickr. The main reason was that I wanted a place to store videos, mostly (as I was and still am in the process of digitizing old VHS home videos, which of course will never end), from which I could also easily share them. And while I was at it, I might as well just use that place to store my photos too, and consolidate all my sharing needs.

Over a year later, after a couple of failed attempts at using different prepackaged platforms, I started building it as my first real Drupal 8 project. Of course.

But despite all the awesomeness in Drupal 8, and above all the promise of core media, the road to getting all my requirements into place would be very long.

Over the course of the next few posts I’ll cover all my requirements and how I solved them. Eventually, I’d like to clean up my code enough to be able to make it available on drupal.org.

I wanted the system to:

  1. Allow multiple uploads
  2. Collect and save EXIF data
  3. Not make me worry about disk space
  4. Allow private albums and media
  5. Allow select guests to view any private content
  6. Allow easy importing from Flickr

Multiple uploads

Because I wanted to maintain perfect independence between albums and media, I couldn’t just add a Plupload widget to a node’s multi-value image field and be done with it. This for a few reasons:

  1. I wanted to take advantage of the media module (which wasn’t yet in core when I started the project, but the transition from Media Entity was seamless). This would be enough of a reason, but I had a few more.
  2. I wanted to be able to add different types of media to an album.
  3. I wanted to be free not to associate media with any album at all.

The idea of uploading media one at a time was unthinkable—especially not after years working with Flickr’s very smooth upload experience. The only thing that would make sense would be to have an upload process that creates a single media entity for each item uploaded, and that would allow media of different types to be uploaded at the same time.

Chance was that just a few weeks before I started the project someone had tried to tackle exactly the same issue, with the Media Upload module. I tried using that for a while, and I contributed some ideas and a few code fixes, but—long story short—it wasn’t really doing it for me, and rather than trying to hijack someone else’s plans I decided to build my own upload module from scratch. I know, not very community-oriented, but my requirement was for more than just an upload form.

The module provides a two-step form: one step for the upload itself, and one for the details that will populate the media entity.

The upload step uses a Plupload widget by default, because as much as I dislike depending on JavaScript it would be tough to make an HTML-only form work across various browsers. (Not that I’m going to use anything other than the latest Safari, but I’d like to be mindful of others if/when I publish the module.) This way I can also take advantage of a few more of Plupload’s perks, particularly the ability to break up large files.

On submission, the form saves the uploads as temporary file entities and associates them with media entities. The reason I don’t just work with temporary files and entity stubs is that I wanted to display thumbnails in the next step, and as much as I tried I couldn’t find a way to make it happen unless I saved the media entities first. I will get back to that at some point and see if I can get Drupal to produce the thumbnail before saving the entity.

To try to prevent duplicate files, I’m using the File Hash module, which does a pretty good job at figuring out which files already exist in the system, thus which media entities should not be recreated.

The details step is a megaform that displays a few fields for each media entity created:

  • Thumbnail (just a markup element)
  • Title
  • Description
  • A choice of tags from my “Subsets” vocabulary.
  • The audience selection, which is a custom field to decide whether a media entity should be public or private. This is less granular than Flickr’s privacy settings, but I think it’ll do.
  • A “Keep” checkbox that can be unchecked if the user doesn’t want to actually save the media entity, so the entity will just be deleted. This is where it would be better if the media entities hadn’t already been saved, but presumably very few entities will not be kept after being uploaded.

There are a few fancy things happening here:

  • The title and description fields are pre-filled using the EXIF title and caption data. If there is no title data, the file name will be used (which is kinda meh, but it works).
  • The subsets and audience settings are also pre-filled using machine tags.

Submitting the form triggers a batch operation that takes care of saving the media entities with the new information and deleting those whose “Keep” setting was unchecked.

Room for improvement

Each piece of the details form is hard-coded to match the structure of the media form, and doesn’t use a custom form view mode yet. When I started building the form it was early enough in my Drupal 8 game that I hadn’t yet figured out how to pull up a custom entity form. I have since been able to do that in a different part of the system, so eventually I’ll go back and rebuild the details form in a more future-friendly way.

Collecting and saving EXIF data

Note: I updated this section in May 2020 to reflect a more recent solution.

EXIF and IPTC information is fairly unpredictable, but ultimately the few datapoints I needed (creation date, geographic information, shot metadata) could each be stored in its own appropriately typed field, attached directly to the media entity. Each original file still retains all information, so I should always be able to retrieve additional fields if I ever need to.

To retrieve EXIF values from images I had a couple of different solutions: one was the EXIF module, which is very opinionated and outputs data in a very nicely organized array; the other was File MDM, which is more complex and would have required me to massage the data it retrieves a lot more. I chose EXIF mostly because I was impatient and just wanted to get it done, but a different requirement later on forced me to also install File MDM, so eventually I’ll go back and rearchitect this part of the system to avoid having two modules that do pretty much the same thing.

When I retrieve keywords, anything that matches my patterns for machine names (audience and subsets) gets processed accordingly, while everything else gets saved as a keyword.

Room for improvement

The way machine tags work is largely hard-coded to work with my specific setup. It would be fun to make it more generic and build a proper API out of it.

Up next

How I’m dealing with disk-space hunger and what I’m doing to provide reasonable access control.