I was reading Dr Seidel and student’s papers. Lots of it was over my head but something I got to wonder about was how much having to use the “scaffold-level assembly” of the Burmese python genome with “discontinuities” is holding back identification of additional ball python mutations. I gathered that Burmese are like >97% similar to balls but it sounds like maybe there are some problems with how the Burmese genome was done presumably some years back.
How much time/money/work would it take to produce a ball python reference genome with today’s equipment and best practices? How much would that help in the pursuit of new ball python mutation tests if someone was willing to fund it? Maybe we could get a Departmental of Agriculture or Department of Commerce grant? Or has technology made it GoFundMe level now?
There is an unofficial ball reference genome out there, it just is not in the databases
Is it available and more helpful (better quality?) than the 10 year old Burmese reference genome for finding ball python mutations?
I was reading about the billions spent on the first human reference genome and was wondering what it would cost to do a new species now.
I know of a couple snake research folks that have access to it (I am not one of them though)
Cost to sequence a whole genome is maybe a couple grand, plus or minus… Cost to analyze and annotate, that is a fair bit more and also reflective of the quality of the sequence data
Still if down from $3,000,000,000 to maybe less than $30,000 in 20 years that is pretty impressive. Maybe $3,000 for everything soon.
Oh yeah, the cost reduction (and size reduction) in sequencing technology has been fantastic. Really the largest issue hampering things now is computation
The raw computer power or the expert interpretation etc? Any hope of AI or just better computer programming helping on that side?
Little from column A, little from column B
It takes a fair amount of computer power to take all the sequence data, sort it, line it up, and then stitch it together. Add in weird genome architectures that require some pretty sophisticated programs to interpret what the sequence really is versus what may get spit out by mistake (think of sequences that repeat dozens or hundreds of time, how do you know you have the right number of repeats?)
Once you have all that in place, then comes the ‘making sense’ part; what are genes and what are not gene? Where are the regulatory elements? What is actually just junk? Which genes, if any, can you accurately identify and tie to a specific function/protein/pathway?
Bioinformatics is making great strides in this area, but they are also playing a bit of catch-up because the data generation has become insane
As for AI… My personal opinion is that the AI of today is still mostly smoke and mirrors so I do not hold out a lot of hope that it will be able to actually solve anything
I’m assuming Aiden lab will have a ball python assembly posted at some point through their DNA zoo project, they’ve been working with the Houston zoo to sequence and assemble a lot of different genomes using Hi-C.