Building the DataPublic DataSet Feature

We recently launched a Drupal distro called DataPublic. I mentioned in the announcement blog post that it was probably the largest codebase we had worked on at Raised Eyebrow, and that's accurate. The number of custom modules, contrib modules and themes used was definitely up there with some of the largest sites we've built for clients.


Prior to this distribution our only foray into distributions involved downloading and installing them, as well as building features. At Raised Eyebrow we use the Features module to allow us to export elements of a site to code so that we can reuse it elsewhere later. This allows us to get sites up and running as quickly as possible, thereby freeing up our time to focus on more important matters, such as developing an overall site strategy, tweaking information architecture and the user experience, as well as adding other important features.
We've created some features that we roll out as standard on most of the sites we build:
  • Blog
  • Events
  • News/Press Releases
  • Videos
  • Photo Galleries
  • Documents
  • Homepage Slideshow
When the Open Data team at Microsoft approached us with the idea of building a Distribution that would encapsulate these features and some Open Data specific elements, we knew we had a solid foundation as well as some of the building blocks to achieve such an ambitious project. As with all things Drupal, I took at good look at the requirements and said "I THINK I can do most of this with Drupal, ya, I GUESS most of that is doable". It's not a strange response. While I've been using Drupal for over 5 years now, I don't pretend that I know everything about Drupal and it's inner workings. Thanks to a large Vancouver community I've managed to touch almost every part of the framework but there are still dark corners where I haven't trodden and may indeed never. However, with Drupal, you also know that where there's a requirement, there's a solution, so I am mostly confident in my response (If I was fully confident, it would be a resounding "Of COURSE I can do that with Drupal, sheesh!"), and this time was no different.

Other Requirements

Along with the above features, another requirement was the creation of a Dataset content type that would allow for the cataloging of Open Data sets. Something similar to Vancouver's open data portal mixed with Nanaimo's data portal and the DataHub portal. Given that this Drupal distribution was required to run on an Microsoft Azure server the creation of a content type on Drupal was preferable to installing any of the open source data cataloging tools out there, such as the OKFN's CKAN project. Given that I'd worked closely with CKAN for the project I knew exactly what I needed to build a similar tool in Drupal. 

Anatomy of a Dataset

Here's the final product- This is what we're wanting to achieve.
The requirements for the Dataset content type were relatively straightforward, along with the usual Title, Body, Date and Taxonomy fields, there were three other special fields:

File Upload field (FileField field)

The File Upload field must allow editors to upload various open data files to the Drupal website (CSV, SHP, DWG, XLS etc.)

External link field (Link field)

The External link field must allow editors to enter a fully qualified URL to a remotely hosted dataset file.

OGDI instance field (OGDI field)

The OGDI instance field must allow editors to display the content of an OGDI instance dataset along with a map, if applicable.

The wireframes required the Datasets landing page to look like the screenshot below:

Field Collection module

Creating the File Upload and External fields was relatively straightforward. However, each file item needed a description field and a format chooser field. To achieve this I used the Field Collection module. This module allows the creation of a field entity which can contain other field entities. Therefore I could create a file field, a description field and a type field and wrap them in a another field called "Uploads". Then rinse and repeat for the External link field.

Field Group module

To group these together into an interface that was usable I used the Field group module. This allowed me to display the fields as horizontal tabs. Each field collection field has it's own tab and contains the collections' individual fields. This works well and avoids a lot of scrolling when you have multiple fields for each collection.

Computed Field module

Having solved the data entry, the next step was to figure out the display of the file formats in the table on the Datasets landing page. See the screenshot of the wireframes above. The formats for each field needed to be displayed as columns in a view. Given that each type of field had it's own format field, and that there could be unlimited numbers of each it would be impossible to do this in views. The solution? Computed field module.
There fact that there are a finite number of formats that can be chosen when uploading a file or adding a link made this relatively easy. I created 8 "indicator" computed fields that would store a value if a format was set in any one of the Upload or External link format fields. So for example if someone creates a Dataset and uploads a CSV file and links to external CSV and DWG files. The CSV indicator field would be populated with "CSV" and the DWG indicator field with "DWG". These indicator fields can then be used as columns in a view and can also be used to sort the Datasets. 
How do the indicator fields get their values? As they are computed field, they require their values to be generated by code. You can simply enter PHP into each field, however as the aim of a distribution is to export everything to code, I chose to write some custom code to populate the fields:
 * These functions determine the values for the CCK computed fields.
 * @see
function computed_field_field_dataset_csv_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'CSV', $langcode);  
} // computed_field_field_dataset_csv_compute()
function computed_field_field_dataset_dwg_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'DWG', $langcode);  
} // computed_field_field_dataset_dwg_compute()
function computed_field_field_dataset_kml_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'KML', $langcode);  
} // computed_field_field_dataset_kml_compute()
function computed_field_field_dataset_kmz_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'KMZ', $langcode);  
} // computed_field_field_dataset_kmz_compute()
function computed_field_field_dataset_shp_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'SHP', $langcode);  
} // computed_field_field_dataset_shp_compute()
function computed_field_field_dataset_xls_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'XLS', $langcode);  
} // computed_field_field_dataset_xls_compute()
function computed_field_field_dataset_xml_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'XML', $langcode);  
} // computed_field_field_dataset_xml_compute()
function computed_field_field_dataset_api_compute(&$entity_field, $entity_type, $entity, $field, $instance, $langcode, $items) {
  $entity_field[0]['value'] = datapublic_datasets_format_checker($entity, array('field_dataset_upload', 'field_dataset_external'), array('field_dataset_upload_file_format', 'field_dataset_external_format'), 'API', $langcode);  
} // computed_field_field_dataset_api_compute()
 * Check the value of the Format select list for each field in the dataset.
 * @param object $entity The entity object
 * @param array $field_collection_ids The array of field collection ids to check for other fields in
 * @param array $field_ids_to_check The array of cck format fields to check for a set value
 * @param string $format The format we want to match the values in the above cck fields to
 * @param string $langcode The language code string to determine positioning in field arrays
 * @return string The themed format value to populate the computed, if the actual CCK format field is set.
function datapublic_datasets_format_checker(&$entity, $field_collection_ids, $field_ids_to_check, $format, $langcode) {
  // Loop through the field_collection fields...
  foreach($field_collection_ids as $field_collection_id) {
    // For each field_collection entity, get the value (ID) of the field_collection field and load the actual field_collection entity...
    foreach ($entity->{$field_collection_id}[$langcode] as $entity_id) {
      $new_entity = entity_load('field_collection_item', array($entity_id['value']));
      // Now check the new entity to see if it contains any of the field's (field_ids_to_check) we're looking for...
      foreach($field_ids_to_check as $field_id_to_check) {
        if(isset($new_entity[$entity_id['value']]->{$field_id_to_check})) {
          $format_field_value = $new_entity[$entity_id['value']]->{$field_id_to_check}[$langcode][0]['value'];
          // If the format value in the field matches the format we're checking for...
          if($format == $format_field_value) {
            $formats[] = $format;
            break 3; // We've gotten what we need, now break out of the 3 for loops, theme and return...
          // If the format is the API we must return whatever the user has entered...
          } elseif($format == 'API' && !in_array($format_field_value, array('CSV', 'DWG', 'KML', 'KMZ', 'SHP', 'XLS', 'XML'))) {
            $formats[] = $format_field_value;
  return theme('dataset_indicators', $formats);
} // datapublic_datasets_format_checker()

OGDI Field

The OGDI field is another large module and requires it's own separate walkthrough. I'll get to this in the very near future. For now you can download and review the OGDI field module for Drupal 7 here.

Get it at GitHub

You can review all of this code yourself at GitHub. You can fork it, add an issue to the queue or send me an email if you have any question or problems. I'm only glad to help in whatever way I can. If you're up to it you can also poke around and look at all of the other custom modules and features that make up the DataPublic distro.