Select Page

Research digital skills training 2021

ARCI, archaeology eResearch collaboration initiative

Joshua Emmitt, Graduate Teaching Assistant; Dr Rebecca Phillipps, Research Fellow; Prof Simon Holdaway; School of Social Sciences; Sina Masoud-Ansari, eResearch Support; Prof Mark Gahegan, Centre for eResearch

Home • Projects • ARCI, archaeology eResearch collaboration initiative
Figure 1: Excavation on the Ahuahu Great Mercury Island Project, New Zealand. Artefacts are measured in space with a total station before being registered into a database in-field with a tablet.

Background

Archaeologists are interested in human environment interrelationships over long spans of time and often engage in comparative analyses of these relationships. Data are compiled from a range of sources. The Archaeology eResearch Collaboration Initiative (ARCI) is a research group specialising in the management and analysis of data intensive archaeology. Currently the project is working with data from field projects in Egypt, New Zealand, Saudi Arabia, and Australia. These projects generate large amounts of data that need to be shared regularly between researchers around the globe.

Data collection

Data acquired in the field consists of high resolution sampling of archaeological phenomena, recorded as a collection of geographic information with corresponding attribute information. It is not unusual for tens of thousands of observations to be collected during a field season. In addition, survey data can include photographs of archaeological features, including high resolution GigaPan imagery, and LIDAR point data. This data often includes complex file structures or large file sizes, the reading of which by specialist software such as ESRI’s ArcGIS are not easily shared in a structured way amongst researchers, both domestic and international. The Centre for eResearch provided access to the NeSI Data Fabric and customised servers which facilitated data sharing and collaboration, greatly increasing the productivity of the projects involved.

Methods for collection and recording archaeological data can vary between projects to accommodate different research interests, however, there are usually similarities between projects in that recorded artefacts and features need to show both geographic and archaeological context. The question is on what scale this information is recorded by different researchers, and how data can be meaningfully related. To facilitate this, each data point is given a unique identifier (UNID), and each project is given a unique project code (PC). UNIDs are unique within a project, and the PC is never repeated. Together the UNID and PC form a dual-identifier system (PCUNID) which is unique across all projects. This enables the comparison of data between projects.

With the help of the Centre for eResearch, the ARCI team developed the pre-existing data representation schema that could be used to represent geographic and archaeological data across a wide range of physical environments. This improved schema can support both excavation and survey based data collection techniques, as well as incorporate data collected using different instruments and methods. It also enables the recording of metadata about the way the data was recorded. The schema was built into a PostgreSQL relational database and used PostGIS to link spatial attributes based on the PCUNID. This allowed both geographic and descriptive data to be stored in the same database and created a centralised, authoritative source for project data that could be accessed concurrently through various desktop and web-based applications. Using a database also had the benefit of providing fine-grained access control, so that access could be given to students who only needed to edit part of the dataset. This further aided the research workflow as it minimised problems in sharing and integrating changes made by different team members.

 

Figure 2: In-field data collection using notebooks in the Fayum, Egypt.

Workflows and tools

Projects which use the ARCI schema may not have started their data collection with it, meaning that data management and cleaning is required. The Centre for eResearch provided software development support and a number of different programs have been written to aid in the managing and cleaning of data. Some examples of this include tools that synchronise a collection of photos with their UNIDs in a photo database and software that creates attribute fields for geospatial data files based on their location in the filesystem. Some of the most important tools have been those that merge the attributes from a number of spatial data files into a standardised schema. In one project, this resulted in the merging of 1,200 shapefiles with different attributes to three, each with a common set of attributes which reflect the ARCI schema.

Data collected in the field is piece-provenanced meaning that the 3D position of each artefact is recorded (Figure 1). Attribute data is also collected on each artefact which could occur in-field (Figure 2) or in the lab. While the recording of information is systematic, errors can occur, and data collected is checked before integration into the main database. With this in mind a series of programs and database workflows were created to help clean the data before integration. These workflows identify duplicate, conflicting, or incomplete records. For example, one of the workflows compares the PCUNIDs between the descriptive and geospatial datasets to identify records that lack corresponding entries.

What’s next

The ARCI project plans to roll out several of its interfaces over the next year. This includes a website which will make available several of the workflows developed by the project for other researchers to use. The website will also serve as a front end for our online database which will be at first accessible to researchers involved in the individual projects, with the future goal of facilitating the open-access of archaeological data, something that is increasingly required by funding agencies.
With increased public outreach as well as numerous articles which are being published in academic journals, it is hoped that more researchers will show interest in incorporating their datasets into our schema and database model. For more information please email the ARCI team at arci@auckland.ac.nz

See more case study projects

Data maturity project in High Value Nutrition, National Science Challenge

Data maturity project in High Value Nutrition, National Science Challenge

Haka on the move: sport circuits and cultural performance 

Haka on the move: sport circuits and cultural performance 

Proteins under a computational microscope: designing in-silico strategies to understand and develop molecular functionalities in Life Sciences and Engineering

Proteins under a computational microscope: designing in-silico strategies to understand and develop molecular functionalities in Life Sciences and Engineering

Remote temperature monitoring to reduce the spread of COVID-19

Remote temperature monitoring to reduce the spread of COVID-19

COVID-19 exponential growth visualisation

COVID-19 exponential growth visualisation

Developing virtual capabilities for the Science Payload Operations Centre

Developing virtual capabilities for the Science Payload Operations Centre

Hosting visualisation and analytics tools for COVID-19 studies

Hosting visualisation and analytics tools for COVID-19 studies

Exploring perceptions towards climate change over time on Twitter

Exploring perceptions towards climate change over time on Twitter

Coastal image classification and nalysis based on convolutional neural betworks and pattern recognition

Coastal image classification and nalysis based on convolutional neural betworks and pattern recognition

Calcium signalling in salivary gland acinar cells

Calcium signalling in salivary gland acinar cells

Anti-corruption regulations for promoting socially responsible practices

Anti-corruption regulations for promoting socially responsible practices

Determinants of translation efficiency in the evolutionarily-divergent protist Trichomonas vaginalis

Determinants of translation efficiency in the evolutionarily-divergent protist Trichomonas vaginalis

Analysing text data by time-series feature engineering

Analysing text data by time-series feature engineering

An investigation into Leap Motion device for “gesture-as-sign”

An investigation into Leap Motion device for “gesture-as-sign”

Antibiotic resistance and the “end of modern medicine ”

Antibiotic resistance and the “end of modern medicine ”

Develop short-term eruption warning systems for Whakaari and other volcanoes

Develop short-term eruption warning systems for Whakaari and other volcanoes

Evenly spaced observation fields from irregularly sampled data in the Southern Ocean

Evenly spaced observation fields from irregularly sampled data in the Southern Ocean

Measuring impact of entrepreneurship activities on students’ mindset, capabilities and entrepreneurial intentions

Measuring impact of entrepreneurship activities on students’ mindset, capabilities and entrepreneurial intentions

Using Zebra Finch data and deep learning classification to identify individual bird calls from audio recordings

Using Zebra Finch data and deep learning classification to identify individual bird calls from audio recordings

NETwork! analysis in cancer – managing genomics research data and building a repository workflow

NETwork! analysis in cancer – managing genomics research data and building a repository workflow

The Coronary Atlas – data processing workflow optimisation

The Coronary Atlas – data processing workflow optimisation

3D visualisation of indigenous burial site in Roonka

3D visualisation of indigenous burial site in Roonka

Automated measurement of intracranial cerebrospinal fluid volume and outcome after endovascular thrombectomy for ischemic stroke

Automated measurement of intracranial cerebrospinal fluid volume and outcome after endovascular thrombectomy for ischemic stroke

A new ‘stratigraphy’: interpreting object relationships with 3D point densities

A new ‘stratigraphy’: interpreting object relationships with 3D point densities

Towards the use of deep learning techniques for storm surge prediction

Towards the use of deep learning techniques for storm surge prediction

Using simple models to explore complex dynamics: A case study of macomona liliana (wedge-shell) and nutrient variations

Using simple models to explore complex dynamics: A case study of macomona liliana (wedge-shell) and nutrient variations

Development of Machine Learning methodology for genomic research

Development of Machine Learning methodology for genomic research

An Archaeological database for threatened North Island rock art in New Zealand

An Archaeological database for threatened North Island rock art in New Zealand

Presence: distributed mixed reality learning environment

Presence: distributed mixed reality learning environment

Digital video and the early learning lab

Digital video and the early learning lab

Publishing the Bay of Island Bottlenose dolphin catalogue

Publishing the Bay of Island Bottlenose dolphin catalogue

Modelling the diurnal cycle* of winds and clouds

Modelling the diurnal cycle* of winds and clouds

Presence: distributed mixed reality learning environment

Presence: distributed mixed reality learning environment

Using research virtual machines to analyse fMRI datasets

Using research virtual machines to analyse fMRI datasets

Genomic Virtual Lab (GVL) as a bioinformatics training platform

Genomic Virtual Lab (GVL) as a bioinformatics training platform

SwiftLaTeX- Exploring web-based true WYSIWYG editing for digital publishing

SwiftLaTeX- Exploring web-based true WYSIWYG editing for digital publishing

Climate change impacts on weather-related hazards

Climate change impacts on weather-related hazards

Understanding tumour evolution through augmented reality

Understanding tumour evolution through augmented reality

Myocardial motion tracking and strain calculation using Deep Learning networks

Myocardial motion tracking and strain calculation using Deep Learning networks

OnTask pilot at the Centre for Learning and Research in Higher Education

OnTask pilot at the Centre for Learning and Research in Higher Education

Visualising the University campus in 3D

Visualising the University campus in 3D

Visualising protein interaction

Visualising protein interaction

Biological heritage National Science Challenge eDNA virtual hub

Biological heritage National Science Challenge eDNA virtual hub

Interactive AR art – Project Gordon

Interactive AR art – Project Gordon

1-D numerical models of post-glacial river evolution

1-D numerical models of post-glacial river evolution

Mathematically modelling gastrointestinal electrical activity

Mathematically modelling gastrointestinal electrical activity

3D Cryo-EM reconstructions of macromolecular complexes

3D Cryo-EM reconstructions of macromolecular complexes

Engine knock in a spark-ignition engine with hydrogen supplementation

Engine knock in a spark-ignition engine with hydrogen supplementation

The complex unsteady flow within a fluid-filled annulus and its transition to turbulence

The complex unsteady flow within a fluid-filled annulus and its transition to turbulence

Using data mining for digital ink recognition

Using data mining for digital ink recognition

The landscape costs of brushtail possum dispersal

The landscape costs of brushtail possum dispersal

Accelerating the discovery of natural products made by orphan megasynthases

Accelerating the discovery of natural products made by orphan megasynthases

Improving the short term precipitation forecasts for New Zealand

Improving the short term precipitation forecasts for New Zealand

Finding genetic variants responsible  for human disease hiding in the universe of benign variants

Finding genetic variants responsible for human disease hiding in the universe of benign variants

Revealing key processes in enzyme efficiency through high performance computing

Revealing key processes in enzyme efficiency through high performance computing

3D Electromagnetic modeling and simulation using heterogeneous computing

3D Electromagnetic modeling and simulation using heterogeneous computing

Hemodynamics in the microcirculation

Hemodynamics in the microcirculation

Putting turbulence to work

Putting turbulence to work

Why are some molecules drugs?

Why are some molecules drugs?

Bayesian additive regression trees  vs logistic regression – estimation of propensity scores

Bayesian additive regression trees vs logistic regression – estimation of propensity scores

Fully coupled thermo-hydro-mechanical modelling of permeability enhancement by the finite element method

Fully coupled thermo-hydro-mechanical modelling of permeability enhancement by the finite element method

Modelling dispersal and ecological competition in a statistical phylogeographic framework

Modelling dispersal and ecological competition in a statistical phylogeographic framework

Studying the shape and the size of the universe

Studying the shape and the size of the universe

Planet hunting

Planet hunting

Simulating quantum mechanics on high performance computing cluster

Simulating quantum mechanics on high performance computing cluster

Multiscale modelling of saliva secretion

Multiscale modelling of saliva secretion

Modelling dual reflux pressure swing adsorption (DR-PSA) units for gas separation in natural gas processing

Modelling dual reflux pressure swing adsorption (DR-PSA) units for gas separation in natural gas processing

Improving the treatment of heart disease

Improving the treatment of heart disease

Estimating migration rates in the budding yeast Saccharomyces cerevisiae

Estimating migration rates in the budding yeast Saccharomyces cerevisiae

Number theoretic algorithms in cryptography

Number theoretic algorithms in cryptography

Molecular phylogenetics uses genetic data to reconstruct the evolutionary history of individuals, populations or species

Molecular phylogenetics uses genetic data to reconstruct the evolutionary history of individuals, populations or species

Phylogeny and phylogeography of the family kyphosidae (Perciformes: teleostei)

Phylogeny and phylogeography of the family kyphosidae (Perciformes: teleostei)

Testing what cosmic inflation really predicts

Testing what cosmic inflation really predicts

Multigene environmental DNA data analysis for New Zealand genomic observatory

Multigene environmental DNA data analysis for New Zealand genomic observatory

Finding genetic variants responsible for human disease hiding in universe of benign variants

Finding genetic variants responsible for human disease hiding in universe of benign variants

BEAST, Bayesian evolutionary analysis sampling trees

BEAST, Bayesian evolutionary analysis sampling trees

The formation of surface archaeological deposits in arid Australia

The formation of surface archaeological deposits in arid Australia

Statistical modelling of carryover effects after cessation of treatments

Statistical modelling of carryover effects after cessation of treatments

High-resolution cryo-electron microscopy of protein complexes and machines

High-resolution cryo-electron microscopy of protein complexes and machines

ARCI, archaeology eResearch collaboration initiative

ARCI, archaeology eResearch collaboration initiative

Optimisation of blades on large wind turbines with individual pitch control and trailing edge flaps

Optimisation of blades on large wind turbines with individual pitch control and trailing edge flaps

Quality of care and outcomes in children with cleft lip and/or palate

Quality of care and outcomes in children with cleft lip and/or palate

Geographic and temporal information retrieval on massive document collections

Geographic and temporal information retrieval on massive document collections

Homodynamics in the microcirculation

Homodynamics in the microcirculation

Processing structure-from-motion photogrammetry on the cluster

Processing structure-from-motion photogrammetry on the cluster

Computational investigation of catalysis mechanisms for polyurethane synthesis

Computational investigation of catalysis mechanisms for polyurethane synthesis

Virtual childhood obesity prevention laboratory

Virtual childhood obesity prevention laboratory

Giving Pacific research greater reach

Giving Pacific research greater reach

Development of novel waveguides  in the terahertz (THz) region

Development of novel waveguides in the terahertz (THz) region

Modelling of costs of diets  by INFORMAS

Modelling of costs of diets by INFORMAS

Foodback

Foodback

Finite element method code for  modelling biological cells

Finite element method code for modelling biological cells

The future of memory: Neuroimaging memory and imagination with functional MRI

The future of memory: Neuroimaging memory and imagination with functional MRI

Modelling and visualisation of calcium waves in parotid acinar cells

Modelling and visualisation of calcium waves in parotid acinar cells

Mapping donor contributions in the Pacific

Mapping donor contributions in the Pacific

Visualising humpback whale migration

Visualising humpback whale migration

Visualising the 2010 and 2011  Canterbury earthquakes

Visualising the 2010 and 2011 Canterbury earthquakes

Data management planning for MOA*

Data management planning for MOA*

Research data publishing  and preservation at COMPASS

Research data publishing and preservation at COMPASS

Centre for eResearch machine learning service

Centre for eResearch machine learning service

Building a discrete global  grid gazetteer service

Building a discrete global grid gazetteer service

The new Wanhal catalogue

The new Wanhal catalogue

Passive acoustic modelling

Passive acoustic modelling

Using GPUs to expand our understanding of the Solar System

Using GPUs to expand our understanding of the Solar System

Shedding new light on dark matter

Shedding new light on dark matter

Aerodynamics modelling paves the way for improved yacht designs

Aerodynamics modelling paves the way for improved yacht designs

Modernising models to help diagnose or treat disease and injury

Modernising models to help diagnose or treat disease and injury

Wandering around the molecular landscape: embracing virtual reality as a research showcasing outreach and teaching tool

Wandering around the molecular landscape: embracing virtual reality as a research showcasing outreach and teaching tool

ALTER: Between human and nonhuman – a VR art exhibition

ALTER: Between human and nonhuman – a VR art exhibition

Disposition of Microsoft HoloLenses for a Pop-Up Reality Shop to demonstrate the progress of a research project

Disposition of Microsoft HoloLenses for a Pop-Up Reality Shop to demonstrate the progress of a research project

Improving diagnosis for schistosomiasis by using the ‘metabolic footprint’ of urine samples from an animal model of Schistosoma infection to identify possible biomarkers

Improving diagnosis for schistosomiasis by using the ‘metabolic footprint’ of urine samples from an animal model of Schistosoma infection to identify possible biomarkers

Making stroke recovery prediction tools freely available

Making stroke recovery prediction tools freely available

MFT-ICR mass spectrometry data management and analysis workflow

MFT-ICR mass spectrometry data management and analysis workflow

Taking a ‘Big Data’ approach to find new clinical-omic associations in cancer

Taking a ‘Big Data’ approach to find new clinical-omic associations in cancer

Growing Up in New Zealand

Growing Up in New Zealand

Improving arrival time predictions for vehicles in a public transport network

Improving arrival time predictions for vehicles in a public transport network

Distributed and cloud-based control at field-level for systems interacting with soft bodies

Distributed and cloud-based control at field-level for systems interacting with soft bodies

Mobile Click Fraud Attack (MCFA)

Mobile Click Fraud Attack (MCFA)

Skin-omics: exploring the volatile organic compounds on human skin

Skin-omics: exploring the volatile organic compounds on human skin

New analytics tools for workload planning for the 2018 New Zealand Census

New analytics tools for workload planning for the 2018 New Zealand Census

Visualising the New Zealand Index of Multiple Deprivation

Visualising the New Zealand Index of Multiple Deprivation