Besides uncovering the gaps in healthcare infrastructure, the pandemic has thrown up the lack of a data-driven ecosystem and culture in the country at various levels of the healthcare system. In today’s world, as much we appreciate the potential of genome science, we need to know that it’s a resource-intensive work around quality data (input, storage, and analysis). Over the last few decades, the strong ties between DNA and Computer Science have revolutionised genomics technologies. At this juncture, about Covid genome sequencing we must know that we are now in an era of rapid (and cheaper) sequencing. But today the ability to determine DNA sequences is starting to outrun the ability of researchers to store, disseminate, and analyse data. Genome studies need pre and post-sequencing data management to make sense of the genome mapping and work towards the epidemiological goals. Genomic surveillance is the best we can do to track the virus and prepare public health defence measures against it. The importance of this aspect of pandemic management can be ignored only to our detriment in the face of the third wave/ future waves.
Data generated from whole-genome sequencing is huge (in terabytes) and demands computational capabilities to manage it. Analysed genomic information requires to be combined with clinical/ epidemiological inputs that in turn can yield insights on the virus that can be used in public health interventions. The sequencing process needs a high level of laboratory infrastructure that is expensive. As India had spent very little per capita on healthcare before the pandemic, there is a lot needing investment/funding for the India genome project. The funding/investment, I believe should be through public and private involvement considering the immense capability of the Indian private sector in Genome Informatics when pitted against the public sector.
The convergence of biology and computing is necessary for this relatively obscure technology. Essentially a biologist and a programmer should work closely to facilitate the development of tools and systems that can solve a biological question. Many public health laboratories may not have the right bioinformatics capability (Kelly F. Oakes on Comments to Author, 2017) and data management resources for large scale public health projects. Also, Database management and big data analytical capabilities may not be in alignment with some of the public sector institutes’ objectives which are mostly around teaching and human resource capacity building in Biotechnology and microbial research.
As we know detecting mutations/variations can identify the cause of outbreaks: the virus behaviour — the fast-spreading or the immune escaping variants — guide public health policies, and even find a drug/cure or inform vaccine researchers. To detect genome variations, millions or billions of data points have to be analysed through computational techniques — pattern analysing algorithms, mathematical models, image processing and so on.
Apart from the public sector regional labs identified by the Genetic Consortium, there are Indian genomics companies in the private sector that have world-class capabilities. And apart from these, the IT giants of India, have one or two genomics labs each, and with state-of-the-art infrastructure handling liquid biopsies and doing work mostly in NGS (Next-generation Sequencing). These genome science labs of IT corporate houses are adept at preparing data files and computational techniques besides performing the steps of gene/ whole-genome sequencing. I believe, these capabilities in India’s IT sector can contribute to the country’s Covid scene by directly contributing to laboratory research work for its R&D experience in the field. This I believe will enable the delivery of standardised genomic data meeting international quality requirements; thereby catching up with the required GISAID or GenBank data contribution requirement for the country.
In a well-designed PPP (Public-Private-Partnership) model, these Genomic labs in the private/corporate sector will be able to provide not only the required lab infrastructure for genome sequencing (or mapping) but also the much-required strong digital capabilities to complement the process and thereby support NCDC (National Centre for Disease Control). Authorities should find ways to incorporate these labs with high infrastructure that are not licensed for clinical use but have been contributing to high-level research work in tandem with renowned cancer hospitals and oncologists for their skill and include them in the genome surveillance efforts for the greater public good. The Bioinformatics capability of the Indian IT sector will be able to transform the genomic surveillance scenario of the country, thereby helping in pandemic preparedness.
As we know, by now India should have sequenced more than five million samples to have a good understanding of the virus and its strains, but so far 11,047 sequences have been performed (of the 1.4 million samples sequenced worldwide) according to GISAID. Currently, less than 0.05% of positive cases in India are subjected to such mapping while the recommended number is 5% of all samples. On the other hand, few countries (like the UK, the US, Belgium) have been doing whole genomic sequencing in real-time to inform/update the public health response system.
Unavailability of metadata along with Covid samples sent for genome evaluation is another concern which I believe is for data privacy or ethics issues. The authorities should address this the soonest and enable the collection of complete relevant epidemiological data (demographic, clinical and laboratory) through public health workers in the right format, and share it — anonymised or as-is with patient consent with the laboratories where the samples are sent for analysis. At this point, we must also remember that life sciences or healthcare data are always un-structured unlike other branches of science, and data scientists often find biological data technically trickier to organise. Readying the data for research use itself may be a struggle and may necessitate the use of high-end techniques like natural language processing.
The Covid-19 pandemic has ushered in a new digital era and is rewiring the world’s perspective to genomic science and sensibilities to personal data privacy in public health management. Governments around the globe are imposing new digital surveillance tools to track and monitor individuals for the new norms of Covid etiquettes as well as the morphology of the virus for variations to bolster defences against the novel virus.
The writer is a medical doctor (pathologist) and holds an MA in Creative Writing from the University of London. The views expressed are personal.