1)     GAGA is non-profit and non-commercial. However, GAGA-supported subprojects can benefit from the reduced costs granted to GAGA by Novogene and other companies for library preparation and sequencing.

2)     Subprojects need to be approved by the governing board of GAGA and announced on the website antgenomics.dk. Please fill out the word doc available at GAGA’s project resources and send it to a member of the governing board to propose a subproject.

3)     The sequencing company Novogene has agreed to provide PacBio sequencing (15 GB, ~8-12 kb reads, ~50 x coverage) for GAGA genomes and delivering the raw data within four weeks after receiving samples of sufficient quality and quantity. The same service will be available to independent PIs running subprojects under the GAGA umbrella to sequence additional ant genomes.

4)     For each PacBio sequenced genome, additional resequencing data (Illumina 150 bp paired-end, 350 bp insert library, ~15 GB, ~50x coverage) might be necessary for reads correction to yield a high quality assembly (see Table 1). Library preparation and resequencing will be available through Novogene for GAGA subprojects at a low price.

5)     One of GAGA’s main projects is to study mutation rate evolution in ants. For this, we aim to re-sequence individual queens and her male offspring for as many different species as possible. If such samples can be provided for a given species through a sub-project, GAGA will fund the resequencing of these species. These data will then be used in the mutation rate project as well as in the genome assembly. The resequencing requires 2x ~500 ng of input DNA coming from a single queen and one or more male offspring of this queen, making this option presently not feasible for rather small ants. These samples should be shipped to CSE for coordination of sequencing.

6)     Transcriptomic data will be necessary to produce high quality gene annotations and we highly recommend that gene expression data will be generated for any species targeted for genome sequencing in subprojects. Library preparation and sequencing for ~6 GB of transcriptomic data (150 bp paired-end, 350 bp insert library)  will be available through Novogene for GAGA subprojects at a low price.

7)     Unfortunately, GAGA cannot offer to extract the DNA/RNA for subprojects, for administrative and bureaucratic reasons. However, we will be happy to provide guidance and extraction protocols for subprojects. Extracted DNA will have to be shipped directly to the sequencing company in China (Novogene).

8)     GAGA can provide a quote for the sequencing costs through Novogene or any other associated company.

9)     Invoices will be sent directly from Novogene to a subproject PI’s institute, without the involvement of GAGA.

10)   GAGA will coordinate the assembly and annotation using standardized pipelines for all genomes generated as part of GAGA subprojects, unless PI groups want to do this faster themselves (see also below).

11)   All genomic or transcriptomic data generated for subprojects under GAGA will be used in GAGA’s core projects to be published in 2021, but thus does not preclude earlier publication (see below).

12)   Whenever feasible, it will be possible to use genome sequences of the 200 GAGA core species as outgroups in a given subproject. This will require that some members of the GAGA consortium can be participated as co-authors (e.g. those collecting the samples for a given outgroup species).

Data production/quality for GAGA core species and sub-project species

For the 200 core species targeted by GAGA, we aim to produce genomic, transcriptomic and metagenomic data for each species (Table 1). We recommend generating similar data for species that are part of subprojects.

We ask that subprojects contribute the samples required for GAGA’s main projects, whenever possible, because these samples are planned to be used in overarching analysis in the key GAGA papers to be published in 2021. These samples will yield data required for analyzing ant-microbe symbioses, queen/worker differential gene expression, mutation rate evolution, genomics correlates of macro-evolutionary and life-history trends. In addition, we request samples that can be stored as morphological/genomic voucher specimens.

Table 1: Sequencing strategy for GAGA 200 core species.

PurposeMin. required inputTechnologyExpected outputSample notes
Genome sequencing10 μg DNA (High molecular weight DNA)PacBio15 GB, 5-10 kb read length. Can be a pool of   multiple individuals
Genome sequencing~1 μg DNAshort reads, DNAseq15 GB, 200 bp paired endCan be a pool of   multiple individuals
Gut microbiomesNAshort reads, 16S sequencing2 GBPooled samples from the same colony
Queen/worker gene expression, gene annotation~1 μg RNAshort reads, RNAseq6 GB (queens/workers/etc.), 200 bp paired endPooled samples from different castes, developmental stages
Mutation rate project~500 ng DNA for one libraryShort reads,DNAseq8 Gb, 200bp paired end for each libraryTwo libraries will be needed. One for the single queen, the other for the pooled males from the same sequenced mother queen.



GAGA – Subproject association

There will be different possible levels of association between subprojects and GAGA. Here are three example scenarios for independent, collaborative and integrated subprojects. Intermediate solutions between these three scenarios will of course be possible. For every subproject, a detailed outline of the association with GAGA will be developed by the subproject’s PI and members of the GAGA governing board.

Scenario 1: Maximal independence of GAGA

Under this scenario, the PacBio data will not be processed by the GAGA consortium and the raw data will be delivered to the sub-project group, so the sub-project group will do the assembly and annotation themselves. This would likely allow faster analysis and publication, but by the time all standard GAGA pipelines are established and running, GAGA will then redo the assembly and annotation, not to challenge any earlier analyses, but to make sure that these (possibly already published) genomes will be maximally comparable to the 200 core GAGA genomes and to maintain uniform quality standards in the final comparative analyses that will be compatible with any standards that journals might define as minimal requirements in the coming years.

Scenario 2: Basic Collaboration with GAGA

Under this scenario, the raw data will be processed and the genomes assembled and annotated by the GAGA consortium in coordination with the sub-project group. Depending on the work load (e.g. number of species), the involved GAGA researchers should be acknowledged as co-authors (middle authorships) on publications depending on their contribution. This will usually involve between one and two junior researchers and one senior researcher.

Scenario 3: Full integration with GAGA

Under this scenario, the sub-project will become fully integrated into GAGA and receive priority resources from Guojie’s bioinformatics group in Shenzhen. This will imply coordinated publication of the sub-project paper so it becomes part of the “2021 package” to be distributed across a few journals with whom GAGA hopes to negotiate special issues. We aim to organize at least one major flagship paper with all GAGA data (see below for authorship rules for collectors on the main paper) and a series of accompanying papers encompassing the major GAGA-initiated projects and the most incisive sub-projects. Sub-project PIs will lead on their own projects and they will share authorships on the flagship paper as co-coordinators, but will also have a number (but not all; depending on real contributions) of GAGA consortium co-corresponding authors on their sub-project paper(s). It will always be up to sub-project PIs to evaluate the pros and cons of publishing later, but then having the possibility of their paper becoming wide-ranging enough to join the final set of GAGA papers to be submitted to special issues in major journals, or to go ahead faster according to scenario 1 or 2.

We also welcome consortium members with specific expertise to propose new cutting edge comparative programs to be initiated with GAGA data. Such PI-groups will receive leading authorships for projects they propose and spearhead. Framework agreements on this can be made at the start and be adjusted as we go along so the final authorships will always reflect the volume of hands-on work by the junior authors and the weight of coordination efforts by the senior authors.