Data Blog

Highlighting GeoNet's GitHub data repository - where our homeless data live

Published: Mon Mar 24 2025 10:14 AM
Data Blog

Welcome, haere mai to another GeoNet Data Blog. Today’s blog is about our GitHub data repository and how it hosts data that have nowhere else to live.

If you are at all familiar with GeoNet data, you'll realise most of it falls into one of a limited number of categories that determines how the data are stored and how we make them available to you. Those categories tend to largely be decided by the kind of files the data come in, think RINEX, miniSEED, CSV, JPG, etc.

For example, whether the JPG files (the usual format for photographs) are of volcanoes or views of the rebuild of Christchurch after the 2011 earthquake, we make them available the same way, via our data webpage. The same applies to miniSEED, a format commonly used for seismic data. We use the same data collection systems for all our miniSEED format data, so you can find all the data in one spot - our FDSN webservice. In addition to seismic data, this provides infrasound-acoustic data (transient air pressure changes), coastal tsunami gauge data, and data we collect to measure the Earth's magnetic field, as they are all in miniSEED format.

But, of course, there are exceptions, what we are going to call our homeless data. GeoNet collects some datasets that don't fall into the common data categories, so we can't make them available using one of the main data delivery applications. As most of these datasets are small, we can't invest the time and effort to build them their own data delivery application. Yet, we are still required to make them available to our data community, and ideally in a way that avoids "phone a friend" as that requires us to deal with every request manually.

The way we solve this problem is by using GeoNet's data repository on GitHub. GeoNet uses GitHub for all of its application development and for our Delta metadata database, so it isn't too much of a stretch to use it as a home for our otherwise homeless data.

GitHub screenshotA screenshot of the top of the page from GeoNet’s data repository on GitHub.

GitHub screenshotA screenshot of the top of the page from GeoNet’s data repository on GitHub.

GitHub screenshotA screenshot of the top of the page from GeoNet’s data repository on GitHub.

The data GitHub repository


Our data repository on GitHub is a public repository, which means anyone can access the data, and make a copy of the complete repository. It currently contains 10 datasets. The repository has a file README.md, which explains the repository’s purpose and the datasets it contains. The file formats in the repository are universally CSV (comma-separated values) as these are easily read and used by almost every computer application.

Folder Name Description Updates
MLNZ20* New Zealand MLNZ20 event local magnitude evaluation dataset Will be superseded once MLNZ20 is fully implemented into the GeoNet Rapid Earthquake Catalogue Production system
historic-volcanic-activity Catalogues of historic volcanic activity at New Zealand volcanoes Unlikely to be updated until new activity datasets are added
moment-tensor Moment tensor solutions for earthquakes in New Zealand (generally magnitude above 4) At least monthly, depending how often new solutions are calculated
nzsmd-flatfiles** Compilation of specially processed strong motion data and associated metadata Static dataset, not updated since 2018
rupture-models Seismic and geodetic fault models for significant earthquakes When new models are created. Last updated in 2023
site-class Information on site class/characteristics for GeoNet strong motion sites Static dataset, not updated since 2018
soil-gas Soil gas observations from selected New Zealand volcanoes When new data are collected, typically once or twice a year
strong-motion-peaks Summary files of peak strong motion measurements Static dataset, not updated since 2018
volcanic-alert-levels Changes in Volcanic Alert Level (VAL) at New Zealand volcanoes When a VAL changes, depending on volcanic activity

*Dataset name is NZ local magnitudes. ** Dataset name is Strong motion flatfiles.

Each dataset has a folder with a README.md file that explains the dataset and its file formats. Some datasets will be stored in one file, others in several files, and some require sub-folders. It all depends on the requirement of the dataset.

Updating the data


When describing the datasets in the repository, the main README file says, "These are generally updated on an infrequent or irregular basis". This is because data updates are done manually, and each update has to be reviewed and approved by a member of GeoNet's Science Operations and Data team. All datasets are therefore updated only when required, and for some that isn't very often. Three of the datasets related to seismic strong motion have not been added to since 2018 and should be considered "static". The historic volcanic actively dataset is in the same category and hasn’t been updated since 2021. The dataset updated most frequently is moment tensors, typically more than once a month, though the exact frequency depends on how often moment tensors are calculated.

Accessing the data


You have two options for accessing data in the repository. In one you use your web browser, and in the other you need to know the URL (web address) of the file you want. You can also copy the complete repository using a process called “cloning”.

Web browser file access


Navigate to the data repository and then to the folder containing the dataset you want, e.g., the dart-triggers data. The dart-triggers folder contains just one data file, DART-trigger-catalogue.csv. Click on the filename to see the file. GitHub is pretty smart and displays the file nicely, with column labels, row numbers, and a search capability. GitHub does this for most files, provided they aren't too long. For the files that are too long, you'll just see what GitHub calls the raw version, without the nice formatting. To get the data you can click on the “Copy raw file” logo and then paste the file contents wherever you wish or click the “Download raw file” logo to download the file. It’s that simple.

URL file access


If you want to write a computer program to use data from a file in the data repository, it is more convenient to have your program retrieve the file you need when you need it. This is particularly useful if you are using one of the files that is updated relatively often, the moment tensor file being the best example. To find the URL, you need to first use your web browser to navigate to the file you want, then click on the “Raw” icon above and to the right of the file display. This takes you to the unformatted, raw version of the file, which will be in CSV format. Copy the URL and paste it into your program, or wherever you need to use it.

For the moment tensor data, the URL is:


https://raw.githubusercontent.com/GeoNet/data/refs/heads/main/moment-tensor/GeoNetCMTsolutions.csv

There are a couple of common ways to get data using its URL. If you use the command line then the “curl” command (available for Windows, Mac, and Linux) can be used to retrieve a file, e.g. the code below will download the moment tensor file to a file called myfile.csv on your computer.


curl "https://raw.githubusercontent.com/GeoNet/data/refs/heads/main/moment-tensor/GeoNet_CMT_solutions.csv" -o myfile.csv

If you use Python for data analysis, you can read the GitHub file directly into a pandas dataframe using the URL. There is an example in one of our data tutorials.

Cloning the repository


If you are a regular GitHub user, you might prefer to clone the complete repository. In your web browser, navigate to the data repository. There should be a green icon labeled “Code”. Click on the inverted triangle, select “SSH” and then copy the URL. You can clone the repository if you have git software installed on your computer (available for Windows, Mac, and Linux), using the command:


git clone git@github.com:GeoNet/data.git
From there you have a copy on your computer and can use the data in any way you wish.

Acknowledging the datasets


Although the datasets in GeoNet’s data GitHub repository are freely available, and we encourage you to use them, please acknowledge datasets you use by citing them, and doing it in a way that makes it easier for us to find those citations. Citations help us show the agencies that fund GeoNet that you are using the data.

Most of the datasets in our GitHub data repository have entries in GNS Science’s dataset catalogue, for example, the volcanic alert level dataset. The correct way to site that dataset is shown at the bottom of the dataset’s summary.


Cite as:
GNS Science. (1994). GNS Science Aotearoa New Zealand Volcanic Alert Level Datasets [Data set]. GNS Science.  [https://doi.org/10.21420/we5s-1n52](https://doi.org/10.21420/we5s-1n52?x=y)

Data tutorials are coming


Data tutorials are how we help the more technical users in our data community access and use individual datasets through basic python language computer code examples. While we’ve shown historic volcanic activity and volcanic alert levels in previous blogs, we haven’t had tutorials for those or any of the other datasets in the data repository. That is about to change. We are preparing tutorials for the data repository datasets and those will be available as soon as they are ready. In the meantime, if you are having difficulties accessing or using any of the data repository datasets, please get in touch with us at info@geonet.org.nz.

That’s it for now


Our main datasets, those that are larger in volume and are collected in one of the more common formats, are well catered for by GeoNet’s core data delivery applications. For the datasets that aren’t catered for by those applications, the GeoNet GitHub data repository plays a critical role in providing a home for the data, so our data community members can access and use those smaller, but still valuable datasets.

You can find our earlier blog posts through the News section on our web page - just select the Data Blog filter before hitting the Search button. We welcome your feedback on our data blogs. If there are any GeoNet data topics you’d like us to talk about, please let us know!

Ngā mihi nui.

Contact: info@geonet.org.nz