National (PID) Graph
The National Graph project is a collaborative approach to building a national-level graph of persistent identifiers. This new capability provides insights into the collaborations between research institutions, industry, and international partners.
You can contribute to this project by joining the national graph working group.
Australian National Graph
The Australian National Graph represents the initial effort to build a national-scale PID graph database. Based on the Research Graph schema, National Graph links over a thousand research organizations to their respective research outputs.
Last update: 5 Jan 2023
National Graph project utilises the robust technology of Neo4j to construct and deploy our graph database. Renowned for its scalability, performance, and user-friendly interface, Neo4j enables us to develop a native graph database based on the Research Graph Schema that has been purposefully designed and optimised to deliver high-performance graph queries.
Nodes are grouped by labels
Match (n:researcher) return * limit 10
Match (n:orcid) return * limit 10
Match (n:researcher:orcid) return * limit 10
Neo4j nodes are grouped by labels. These labels enable optimised information retrieval, and provide a logical structure of the graph. Labels in our graph represent metadata sources or data types, allowing for targeted searches based on specific metadata, such as finding researchers with ORCID profiles.
For more information on graph queries refer to https://neo4j.com/docs/cypher-cheat-sheet/current/
Multilayer Data Architecture
National Graph uses a three layers architecture to compose, optimise and disseminate the graph to our project partners. This multilayer architecture enables separation of different major components and provides more optimised serviceability.
The data pipeline extracts data from major persistent identifier (PID) providers like Crossref, ORCID, and DataCite, among others. Microservices within this layer work to consolidate the data into a connected Neo4j graph. The layer utilizes the Research Graph schema and technology to connect PIDs by applying metadata, text mining, and entity resolution algorithms.
This layer serves to construct a streamlined graph of Australian research output. A fundamental aspect of this layer involves identifying and establishing links between organisation names and PIDs, including ISNI, ROR, and WikiData.
The data access layer comprises a distributed network of Neo4j databases that facilitate access to National Graph content through the Neo4j graph interface. This network's decentralized architecture empowers Australian researchers to leverage their cloud infrastructures to manage and coordinate access to the database effectively, allowing universities to allocate resources as needed.