About this Guide

This guide is intended for people who want to implement a connection with the Repository Junction Broker service.

If you have further questions about the service that are not answered by the guide please use the UK RepositoryNet+ Helpdesk contact form or send email direct to: support@repositorynet.ac.uk

About the RJ Broker

For an overview of the broker please refer to the RJB: User Manual.

RJ Broker Architecture

RJ Broker Architecture

The broker is itself a repository that uses SWORD v1.3 to receive and transmit records.

Deposit Records

Suppliers deposit records into the broker using an agreed package format (see here), which the broker then unpacks.

Parse Record Metadata

The broker parses the supplied metadata to identify organisations, and therefore repositories, that this record should probably be sent to.

The broker is reliant on the Organisation and Repository Identification Service to make this practical. As part of the repository identification process, the broker also marks those repositories which have subscribed to the service.

Transfer Record

Periodically (initially daily), the broker finds all records that have subscribing repositories, but have not had the record transferred to those Repositories. The broker transfers the record, and notes when the successful transfer took place, and the URI the target repository has given for the deposit.

Check Record Alive

Most repositories have a review and/or add curation process before making records live. The broker has a seperate process, run on a daily basis, that looks at all recently transferred records to see if they are visible via the given URI and notes when they are alive.

Supplier Engagement

For suppliers, engaging with the broker is straightforward:

Repository Engagement

For an Individual Repository (IR) engaging with the broker is very easy:

Supplier Requirements

Since each supplier has their own set of metadata fields, the broker uses bespoke importers for each supplier: this allows it to receive data in the format that is best suited to the supplier/broker relationship and allows it to tag imports with the provenance of the depositing user. As the importers are unique to each supplier, the method for identifying the target Individual Repositories can also be tailored.

Organisation/Repository Identification

There are several options for identifying target repositories:

  1. The broker can scan the metadata for postal addresses and/or email addresses, and use the Organisation and Repository Identification (ORI) service to create a list of identifiable organisations, and therefore potential repositories.
  2. The supplier can define a list of repositories, which the broker then just uses.
  3. The supplier can provide a list of MUST repositories, and allow the broker to augment the list with POTENTIAL repositories.

Whilst it is not possible to prescribe a set of fields that must exist, we can show fields that definitely work. From the NLM-DTD, each contributor (author) has an associated affiliation record:

<contrib contrib-type='author' corresp='yes'>
 <name>
   <surname>Picus viridis</surname>
   <given-names>Yaffle</given-names>
 </name>
 <xref ref-type="aff"rid="r1">1</xref>
 <xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
. . .
<aff id="r1">
 <addr-line>The University or Edinburgh, 160 Causewayside, Edinburgh. EH9 1PR</addr-line>
 <country>United Kingdom</country>
 <institution>UK RepositoryNet+</institution>
</aff>

This can be parsed, and we know find “University or Edinburgh” and “UK RepositoryNet+” as organisations.

Likewise, Europe PubMed Central just has a single institution identified. The broker can parse:

<affiliation>Arthritis Research UK Epidemiology Unit, Manchester Academic Health Science Centre, University of 
Manchester, Manchester, UK.</affiliation>

This would identify Arthritis Research UK and University of Manchester as possible organisations, which can be looked up in ORI in order to retrieve associated repositories.

Bibliographic Metadata

In terms of bibliographic metadata, the list of “required” fields is small:

Pretty much everything else can be defaults (deposit is a letter/manuscript; publication date is today; item is not refereed; there are no documents, DOIs, or URI references to documents; no abstract; etc) and the publisher details can be deduced from the journal using the Sherpa/RoMEO service.

Records without identifiable organisations/repositories

Whilst such records are not useable by the broker directly, their inclusion makes them available to third-party services based on the data in the broker.

Repository Requirements

The broker deposits a standard package to all repositories, with the intent that this format can be easily adopted by others, making a defacto standard interchange format.

Terminology

Record - a Deposited Object
Object - the whole thing, a complete record.
Metadata - the descriptive information about the object.
Document - something end users want to read. May be a combination of multiple files (eg: a web page).
Binary Object - a thing end users want to read/view, be it a document/jpeg/spreadsheet. Also called a file.
File - a file.

Basic Overview

The basic unit is a .zip file. This file will contain at least one file, mets.xml containing the metadata, and may contain any number of additional files, with each document in its own directory.

This format was chosen to allow the deposit of an Object which describes a broker Deposit record.... which, per-force, contains a Binary Object that is called mets.xml; a flat system would not allow this.

Depending on embargos and subscriptions, the broker may attach the original deposited file, which allows Individual Repositories to mine that Object for additional data that is not given in the metadata.

Where documents exists, the last document will always be the original deposit item.

The Metadata Description

The basic metadata file, mets.xml is a METS file, with the record metadata encoded in Eprints-DC-XML (epdcx). See SWAPand epcdx for further details.

A basic METS package has 4 significant sections, in the following order:

dmdDecThe Descriptive Metadata Section.
amdSecWhere the administrative (i.e. embargo) information is defined, using the same DCMI Abstract Model that epcdx uses.
fileSecLists all the files containing content which comprise the electronic versions of the digital object.
structMapWhere the structure of the files is described: which files are grouped together, and the embargo details on those files.

dmdDec

This is the main metadata section, and uses the epcdx model developed by JISC. This is heavily based on the SWAP model:

ScholarlyWork type; title; abstract; identifer (the publishers ID); creator; affilitated institution (possibly from authors); funder; GrantCode; isExpressedAs
Expression type; identifer (doi and/or url-at-broker); date (published:yyyy-mm-dd); status (peer reviewed, etc); copyright_holder; citation; references; isManfiestAs
Manifestation publication; publisher; issn; isbn; volume; issue; pagerange (first-last); AccessRights (open/restricted/closed); License; availableAs (doi/official_url/related_urls)
Agent type; name or family-name & given-name; mailbox; (additional oarj namespace) org_name; ori_id

Sections Details

For the ScholarlyWork, the broker currently exports:

type: ScholarlyWork
<epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/type" 
 epdcx:valueURI="http://purl.org/eprint/entityType/ScholarlyWork"/>
identifier: the identifier number as given by the supplier
<epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/identifier">
  <epdcx:valueString>2011-12-06508</epdcx:valueString>
</epdcx:statement>
title: title of the deposit record
<epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/title">
<epdcx:valueString>Teddy Bear Programming: Fact or Fiction?</epdcx:valueString>
</epdcx:statement>
creators: The names of the authors
This record is slightly complex, as there is a reference to a fuller (Agent) record:
<epdcx:statement epdcx:propertyURI="http://purl.org/dc/elements/1.1/creator" 
 epdcx:valueRef="IanStuart">
<epdcx:valueString>Stuart, Ian</epdcx:valueString>
</epdcx:statement>
abstract: the abstract
<epdcx:statement
       epdcx:propertyURI="http://purl.org/dc/terms/abstract">
   <epdcx:valueString>Some text here....</epdcx:valueString>
</epdcx:statement>
affiliated institution: any affiliated institutions (as identified via creator's affiliation).
<epdcx:statement epdcx:propertyURI="http://purl.org/eprint/terms/affiliatedInstitution">
<epdcx:valueString>EDINA</epdcx:valueString>
</epdcx:statement>
Grant and Funder information:These two fields repeat as needed.
<epdcx:statement epdcx:propertyURI="http://www.loc.gov/loc.terms/relators/FND" 
 epdcx:valueRef="funder Arthritis Research UK"/>

<epdcx:statement
   epdcx:propertyURI="http://purl.org/eprint/terms/grantNumber"
   epdcx:valueRef="grant P30-AR-473639"/>
As with creators, the valueRef attributes are references to descriptions later on, however if you remove the “grant ” or “funder ” (notice the space character) from the start of the string, it is the proper value for that item. The epdcx structure does not relate the funders and the grants – these are defined is later descitions.
isExpressedAs: Link to the Expression description.
<epdcx:statement 
 epdcx:propertyURI="http://purl.org/eprint/terms/isExpressedAs" 
epdcx:valueURI="sword-mets-expr-1"/>

Within the Manifestation section, the broker currently exports:

type:
<epdcx:statement 
 epdcx:propertyURI="http://purl.org/dc/elements/1.1/type"
 epdcx:vesURI="http://purl.org/eprint/terms/Type"
 epdcx:valueURI="http://purl.org/eprint/entityType/Manifest"/>
publication: The title of the journal or site the record was published in.
<epdcx:statement 
	epdcx:propertyURI="http://opendepot.org/broker/elements/1.0/publication">
<epdcx:valueString>Acme News</epdcx:valueString>
</epdcx:statement>
issn
isbn
volume
issue
pagerange
These all generally follow the same structure:
<epdcx:statement
     epdcx:propertyURI="http://opendepot.org/broker/elements/1.0/issn">
  <epdcx:valueString>12775</epdcx:valueString>
</epdcx:statement>
accessrights: OpenAccess, RestrictedAccess or ClosedAccess
ClosedAccess will have have no embargoed files; RestrictedAccess documents will have embargo details in the structMap and amdSec sections
<epdcx:statement
       epdcx:propertyURI="http://purl.org/dc/elements/1.1/type"
epdcx:valueURI="http://purl.org/eprint/accessRights/RestrictedAccess"/>
isAvailableAs: This is where other copies of the record are available. There are two categories: available-official_url and available-related_url. These element actually refer to fuller description elements later in the document.
<epdcx:statement
    epdcx:propertyURI="http://purl.org/eprint/terms/isAvailableAs"
    epdcx:valueRef="available-related_url-2"/>
<epdcx:statement
    epdcx:propertyURI="http://purl.org/eprint/terms/isAvailableAs"
    epdcx:valueRef="available-related_url-1"/>
<epdcx:statement
    epdcx:propertyURI="http://purl.org/eprint/terms/isAvailableAs"
    epdcx:valueRef="available-official_url-1"/>

After the main metadata descriptions, the broker lists the various explanation descriptions:

External copies and other versions

Most notable by the Gold Access suppliers, records can have a reference to copies hosted elsewhere. This is where those links are described.

There are two types of link: Europe PubMed Central provides a link to the official record within its data-set, and then there are related URLs, which are anything else.

Where possible, each set will contain the following:

<epdcx:description
       epdcx:resourceId="available-official_url-1"
       epdcx:resourceUrl="http://europepmc.org/articles/PMC3402849">
  <epdcx:statement
       epdcx:propertyURI="http://purl.org/dc/elements/1.1/type"
       epdcx:valueURI="http://purl.org/eprint/entityType/Copy"/>
</epdcx:description>

<epdcx:description
       epdcx:resourceId="available-related_url-1"
       epdcx:resourceUrl="http://www.pubmedcentral.org/articles/PMC3402849/pdf/?tool=EBI">
  <epdcx:statement
        epdcx:propertyURI="http://purl.org/dc/elements/1.1/type"
        epdcx:valueURI="http://purl.org/eprint/entityType/Copy"/>
  <epdcx:statement
        epdcx:propertyURI="http://purl.org/dc/terms/accessRights"
        epdcx:valueURI="http://purl.org/eprint/accessRights/openAcess">
    <epdcx:valueString>Free</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://opendepot.org/reference/rjb/site">
    <epdcx:valueString>PubMedCentral</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://opendepot.org/reference/rjb/format">
    <epdcx:valueString>pdf</epdcx:valueString>
  </epdcx:statement>
</epdcx:description>

<epdcx:description
       epdcx:resourceId="available-related_url-2"
       epdcx:resourceUrl="http://europepmc.org/articles/PMC3402849?pdf=render">
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/dc/elements/1.1/type"
         epdcx:valueURI="http://purl.org/eprint/entityType/Copy"/>
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/dc/terms/accessRights"
        epdcx:valueURI="http://purl.org/eprint/accessRights/openAcess">
    <epdcx:valueString>Free</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://opendepot.org/reference/rjb/site">
    <epdcx:valueString>Europe_PMC</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://opendepot.org/reference/rjb/format">
    <epdcx:valueString>pdf</epdcx:valueString>
  </epdcx:statement>

</epdcx:description>

Funders and grant codes

The RIOXX schema defines a Funder element and a Grant-Code element, with no actual link between them. The relationship between the funder and the grants is defined in these descriptions. For all descriptions, the resourceId is the reference value defined earlier in the metadata.

Funder. Note how multiple grants are listed within a single funder, where that is appropriate:
<epdcx:description epdcx:resourceId="funder NIAMS NIH HHS">
  <epdcx:statement
         epdcx:propertyURI="http://www.loc.gov/loc.terms/relators/FND">
    <epdcx:valueString>NIAMS NIH HHS</epdcx:valueString>
   </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/eprint/terms/grantNumber">
    <epdcx:valueString>K23-AR-50177</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/eprint/terms/grantNumber">
    <epdcx:valueString>N01-AR-42272</epdcx:valueString>
  </epdcx:statement>
</epdcx:description>
Grant. Although we've not come across it yet, this description does allow for multiple funders supporting a single grant.
<epdcx:description epdcx:resourceId="grant N01-AR-42272">
  <epdcx:statement 
         epdcx:propertyURI="http://purl.org/eprint/terms/grantNumber">
    <epdcx:valueString>N01-AR-42272</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://www.loc.gov/loc.terms/relators/FND">
    <epdcx:valueString>NIAMS NIH HHS</epdcx:valueString>
  </epdcx:statement>
</epdcx:description>


<epdcx:description epdcx:resourceId="grant K23-AR-50177">
  <epdcx:statement 
         epdcx:propertyURI="http://purl.org/eprint/terms/grantNumber">
    <epdcx:valueString>K23-AR-50177</epdcx:valueString>
  </epdcx:statement>
  <epdcx:statement
         epdcx:propertyURI="http://www.loc.gov/loc.terms/relators/FND">
    <epdcx:valueString>NIAMS NIH HHS</epdcx:valueString>
  </epdcx:statement>
</epdcx:description>

Agents

Each Agent is defined in their own description section.

Note that there is a reference to the creator element in the ScholarlyWork description:
<epdcx:description epdcx:resourceID="IanStuart">
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/dc/elements/1.1/Type"
         epdcx:vesURI="http://purl.org/dc/elements/1.1/Person"/>
Given name & Family name
<epdcx:statement
	epdcx:propertyURI="http://purl.org/dc/elements/1.1/givenname">
 <epdcx:valueString>Ian</epdcx:valueString>
</epdcx:statement>
<epdcx:statement
	epdcx:propertyURI="http://purl.org/dc/elements/1.1/familyname">
 <epdcx:valueString>Stuart</epdcx:valueString>
</epdcx:statement>
Email address
<epdcx:statement epdcx:propertyURI="http://xmlns.com/foaf/0.1/mbox">
<epdcx:valueString>Ian.Stuart@ed.ac.uk</epdcx:valueString>
</epdcx:statement>
Address
<epdcx:statement
epdcx:propertyURI="http://purl.org/eprint/terms/affiliatedInstitution">
<epdcx:valueString>
   EDINA, 160 Causewayside, Edinburgh. EH9 1PR. United Kingdom
</epdcx:valueString>
</epdcx:statement>
Organisation and orgid code from the ORI service
<epdcx:statement epdcx:propertyURI="http://xmlns.com/foaf/0.1/name">
<epdcx:valueString>EDINA</epdcx:valueString>
</epdcx:statement>
<epdcx:statement
epdcx:propertyURI="http://opendepot.org/reference/linked/1.0/identifier">
<epdcx:valueString>3199</epdcx:valueString>
</epdcx:statement>
</epdcx:description>

amdSec

This is the administrative section of the METS document. Currently it only contains the extended embargo information, using the DCMI Abstract Model.

Sample record:

<amdSec ID="sword-mets-adm-1" LABEL="administrative" TYPE="LOGICAL">
  <rightsMD ID="sword-mets-amdRights-1">
    <mdWrap MDTYPE="OTHER" OTHERMDTYPE="RJ-BROKER">
      <xmlData>
        <epdcx:descriptionSet
              xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/" 
              xsi:schemaLocation="http://purl.org/eprint/epdcx/2006-11-16/
              http://purl.org/eprint/epdcx/xsd/2006-11-16/epdcx.xsd ">
          <epdcx:description epdcx:resourceId="sword-mets-div-3" 
                 epdcx:resourceURI="http://devel.edina.ac.uk:1203/191/">
            <epdcx:statement
                   epdcx:propertyURI="http://purl.org/dc/terms/available"
                   epdcx:valueRef="http://purl.org/eprint/accessRights/RestrictedAccess">
              <epdcx:valueString
                     epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">
                2013-05-29
              </epdcx:valueString>
            </epdcx:statement>
          </epdcx:description>
          <epdcx:description epdcx:resourceId="sword-mets-div-4"  
                 epdcx:resourceURI="http://devel.edina.ac.uk:1203/191/">
            <epdcx:statement
                   epdcx:propertyURI="http://purl.org/dc/terms/available"
                   epdcx:valueRef="http://purl.org/eprint/accessRights/RestrictedAccess">
              <epdcx:valueString
                     epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">
                 2013-05-29
              </epdcx:valueString>
            </epdcx:statement>
          </epdcx:description>
        </epdcx:descriptionSet>
      </xmlData>
    </mdWrap>
  </rightsMD>
</amdSec>

In essence, this defines the availability to be on some date, with a reason of Restricted Access.

Also notice that each description has a ressourceID which links it to the appropriate div in the structMap section:

<epdcx:description epdcx:resourceId="sword-mets-div-3" 
       epdcx:resourceURI="http://devel.edina.ac.uk:1203/191/">
  <epdcx:statement
         epdcx:propertyURI="http://purl.org/dc/terms/available"
         epdcx:valueRef="http://purl.org/eprint/accessRights/RestrictedAccess">
    <epdcx:valueString
           epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">
           2013-05-29
    </epdcx:valueString>
   </epdcx:statement>
 </epdcx:description>

fileSec

The METS section that details the files

Each file has its own record:

<file ID="eprint-191-document-123-0" GROUPID="sword-mets-fgid-123" 
	  SIZE="3670383" OWNERID="http://devel.edina.ac.uk:1203/191/"
	  MIMETYPE="application/gif">
  <FLocat LOCTYPE="URL" xlink:type="simple" 
		  xlink:href="123/Spectator_safety.gif"/>
</file>

Flocate gives the location of the file, within the .zip archive (eg, the file “Spectator_safety.gif” within folder “123”, within the archive.)

Sample record:

<fileSec ID="sword-mets-file-1" LABEL="files">
  <fileGrp ID="sword-mets-fgrp-1" USE="CONTENT">
    <file ID="eprint-191-document-123-0" GROUPID="sword-mets-fgid-123" 
          SIZE="3670383" OWNERID="http://devel.edina.ac.uk:1203/191/"
          MIMETYPE="application/gif">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="123/Spectator_safety.gif"/>
    </file>
    <file ID="eprint-191-document-456-0" GROUPID="sword-mets-fgid-456"
          SIZE="109601" OWNERID="http://devel.edina.ac.uk:1203/191/" 
          MIMETYPE="application/zip">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="456/Broker_imported.zip"/>
    </file>
    <file ID="eprint-191-document-789-0" GROUPID="sword-mets-fgid-789"
          SIZE="11083" OWNERID="http://devel.edina.ac.uk:1203/191/" 
          MIMETYPE="application/pdf">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="789/pdf1.pdf"/>
    </file>
    <file ID="eprint-191-document-789-1" GROUPID="sword-mets-fgid-789"
          SIZE="11278" OWNERID="http://devel.edina.ac.uk:1203/191/" 
          MIMETYPE="application/pdf">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="789/pdf2.pdf"/>
    </file>
    <file ID="eprint-191-document-789-2" GROUPID="sword-mets-fgid-789"
          SIZE="11323" OWNERID="http://devel.edina.ac.uk:1203/191/" 
          MIMETYPE="application/pdf">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="789/pdf3.pdf"/>
    </file>
    <file ID="eprint-191-document-789-3" GROUPID="sword-mets-fgid-789" 
          SIZE="10752" OWNERID="http://devel.edina.ac.uk:1203/191/"
          MIMETYPE="text/xml">
      <FLocat LOCTYPE="URL" xlink:type="simple" 
              xlink:href="789/mets.xml"/>
    </file>
  </fileGrp>
</fileSec>

structMap

This is the section that shows how the files relate to each other.

As mentioned above, a document may consist of multiple files, therefore documents are in seperate directories. However one document may contain many files. Embargoes are applied at document level, so the embargo date is given as an attribute of document div.

Sample record:

<structMap ID="sword-mets-struct-1" LABEL="structure" TYPE="LOGICAL">
  <div ID="sword-mets-div-1" DMDID="sword-mets-dmd-eprint-191" 
       TYPE="SWORD Object">
    <div ID="sword-mets-div-2">
      <fptr FILEID="eprint-191-document-123-0"/>
    </div>
    <div ID="sword-mets-div-3" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-456-0"/>
    </div>
    <div ID="sword-mets-div-4" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-789-3"/>
      <fptr FILEID="eprint-191-document-789-0"/>
      <fptr FILEID="eprint-191-document-789-1"/>
      <fptr FILEID="eprint-191-document-789-2"/>
    </div>
  </div>
</structMap>

This samples shows three useful things:

  1. Of the file(s) in the first document sword-mets-div-2 is not embargoed: the other two are. This means the record will be “Restricted Access”.
  2. The third document sword-mets-div-4 has four files whose order is significant, so document
    eprint-191-document-789-3 is considered the primary document.
  3. The FILEID attribute of each fptr element refers to the ID attribute of the appropriate file element in the fileSec section.