National (PID) Graph

The National Graph project is a collaborative approach to building a national-level graph of persistent identifiers. This new capability provides insights into the collaborations between research institutions, industry, and international partners.

You can contribute to this project by joining the national graph working group.

Australian National Graph

The Australian National Graph represents the initial effort to build a national-scale PID graph database. Based on the Research Graph schema, National Graph links over a thousand research organizations to their respective research outputs.

117,845

Australian
Researchers

26,937

Datasets

10,027,055

Publications

61,391

Grants
Version 1.1.0
Last update: 25 April 2023

1,571

Australian
Organisations

Neo4j Technology

National Graph project utilises the robust technology of Neo4j to construct and deploy our graph database. Renowned for its scalability, performance, and user-friendly interface, Neo4j enables us to develop a native graph database based on the Research Graph Schema that has been purposefully designed and optimised to deliver high-performance graph queries.

Nodes are grouped by labels

Match (n:researcher) return * limit 10

Match (n:orcid) return * limit 10

Match (n:researcher:orcid) return * limit 10

Neo4j nodes are grouped by labels. These labels enable optimised information retrieval, and provide a logical structure of the graph. Labels in our graph represent metadata sources or data types, allowing for targeted searches based on specific metadata, such as finding researchers with ORCID profiles.

For more information on graph queries refer to https://neo4j.com/docs/cypher-cheat-sheet/current/

Multilayer Data Architecture

National Graph uses a three layers architecture to compose, optimise and disseminate the graph to our project partners. This multilayer architecture enables separation of different major components and provides more optimised serviceability.

PID Graph

The data pipeline extracts data from major persistent identifier (PID) providers like Crossref, ORCID, and DataCite, among others. Microservices within this layer work to consolidate the data into a connected Neo4j graph. The layer utilizes the Research Graph schema and technology to connect PIDs by applying metadata, text mining, and entity resolution algorithms.

Optimisation Layer

This layer serves to construct a streamlined graph of Australian research output. A fundamental aspect of this layer involves identifying and establishing links between organisation names and PIDs, including ISNI, ROR, and WikiData.

Access Layer

The data access layer comprises a distributed network of Neo4j databases that facilitate access to National Graph content through the Neo4j graph interface. This network’s decentralized architecture empowers Australian researchers to leverage their cloud infrastructures to manage and coordinate access to the database effectively, allowing universities to allocate resources as needed.
The data from National Graph is accessible to Australian researchers under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.