The war against cancer is increasingly moving into cyberspace. Computer scientists may have the best skills to fight cancer in the next decade — and they should be signing up in droves.
How can computer scientists help?
First, as recently reported in NYT, the cost of millions of short reads of one cell by a gene sequencing machine is dwarfed by the data processing costs to turn them into a single usable three-billion-base-pair digital representation of a genome. To make personalized medicine affordable for everyone, we need to drive down the information processing costs.
Second, we need to collect cancer genomes in a repository and make them available to scientists and health professionals. The computer scientist David Haussler of the University of California, Santa Cruz, for example, is creating one. Plans are that this five-petabyte (5,000,000,000,000,000 bytes) store will house more than 20,000 genomes.
Third, finding a personalized, targeted therapy for each tumor among myriad possible combinations of drugs is like finding a very small needle in a very large haystack. Researchers are exploring the engagement of people when traditional hardware and software are not up to the task.
An inspirational example is the Foldit game — developed by the computer scientist Zoran Popovic at the University of Washington — that recently attracted thousands of volunteers to uncover the structure of an enzyme important to H.I.V. research.
Cancer tumor genomics is just one example of the Big Data challenge in computer science. Big Data is unstructured, uncurated and inconsistent, and housing it often requires a thousand-fold increase in size over traditional databases. It is not pristine data that can be neatly stored in rows and columns. YouTube alone holds nearly one exabyte of videos, which is one trillion megabytes, or 1,000,000,000,000,000,000 bytes.
Session difficulty level: Intro/101
Share this session: