Big Data Zone

Simulated data modeled for the particle detector on the Large Hadron Collider at CERN

Our final zone in the 2014 Big Data Season is funded by the Science & Technology Facilities Council and will explore how computing is applied not only to biology, but also in other areas like astronomy, particle physics or meteorology.

Researchers working at the STFC use computers to store and analyse data to measure atmospheric conditions, develop new weather forecaster models, look out at the stars or understand cancer better.

They work together with scientists from other centres in huge international collaborations which produce immense amounts of data that lead to ground breaking discoveries: like the finding of the Higgs boson thanks to the data produced at the Large Hadron Collider at CERN.

If you are a scientist working on the field, and you’d like to take part in the Big Data Zone, apply here. If you are a teacher, and you want to get your students enthused about Big Data, register here.

Big Data Debate Kit is out!

Big Data teaching resource

Our new free debate kit

The Big Data Debate Kit discusses whether we should sequence the genomes of one million people, to find out more about living longer and healthier lives.

The aim of this educational resource is to explore the social, ethical and political issues behind mass genome sequencing. It contains 8 debate cards outlining the opinions of fictional characters with an interest in a human genome sequencing, and teacher notes to help you to carry out the lesson effectively.

We have sent our first batch of debate kits, but we still have more to be sent to different schools in the UK and Ireland. Click here for more information and to sign up to receive your free copy!

Genomics and Bioinformatics Zones: Evaluation Report

Click on the picture to download the full report

Click on the picture to download the full report

In June 2014, we teamed up with The Genome Analysis Centre (TGAC) to run two Big Data themed zones: The Genomics Zone and the Bioinformatics Zone, each featuring at least two TGAC scientists. And these were our main findings:

  • The students were interested in the scientists: Students read the scientist profiles and asked about the research of each individual scientist. They were also very keen to learn more about the scientists; daily lives and hobbies, as well about their opinions on topical issues like animal testing.
  • Students were interested in Genomics and Bioinformatics: Two difficult subjects which many of the students might not have been familiar with before the event. In fact, one teacher in the Genomics zone said she was pleasantly surprised with the level of the students’ questions and how the scientists explained what genomics is.
  • The Bioinformatics and Genomics Zones exposed students to the new trends in biology: Students learnt how biology is much more than identifying plants and animals, especially since the radical change that computers and Big Data have generated in the field. As one teacher said:

The genomics zone attracted us as it was a chance to expose kids to seeing biology as a difficult subject (which normally they don’t) and also speak to people in the research side of the subject. We felt it would link to a lot of topics in our current themes like microbes and reproduction.”

  • Scientists improved their science communication skills: For some scientists, I’m a Scientist, Get me out of here! was their first public engagement activity, and it served as a good pilot experience. They learnt about students’ interests and how to communicate with them in a more efficient way.

If you are curious and want to read the full report, you can download it here.

Bioinformatics and Genomics; two more Big Data zones in June

One of the latest technologies to sequence or “read” all the information contained in the DNA of different organisms. Image by: TGAC

In our event in June we are running two Big Data Zones: the Genomics Zone and the Bioinformatics Zone co-funded by The Genome Analysis Centre and The Wellcome Trust.

Scientists in the Genomics Zone are studying the genomes of all living things. Our bodies are made of millions of cells, each of them containing a complete instruction manual telling them how to make all the bits that make up that cell, and how to make them work together. This instruction manual comes in the form of DNA and it is called the genome.

In the Genomics Zone we have scientists looking at the content of genomes in everything from tomatoes to sand-hoppers. We also have scientists studying the genome of virus or bacteria to learn more about how their living instructions work, and how we can use that knowledge to fight against them…

The scientists in the Bioinformatics Zone use computer science and maths to read and translate biological and medical data. All the information is later stored in giant databases, which are often shared with the scientists living all around the world. This means that a scientist in the UK might be analysing data generated on the other side of the planet! Looking into these growing collections of data often requires the use of super-computers.  In the Bioinformatics Zone you will meet five different scientists, from PhD students to Professors, working in this emerging field!, where Big Data meets health

“Big Data” is a subject gracing the front pages of may newspapers and websites, but how many of us understand just what “Big Data” is, and how it relates to us?, the Health and Social Care Information Centre (HSCIC) programme to make medical records available for research and health care use, is an example of a Big Data project, and its implications.

Using patient information to support research or plan new health services is nothing new — the NHS has been collecting information from every hospital admission since the 80s — but takes it to a whole new level. will create a complete picture of the care that patients received, including prescriptions and test results. This information is made available to help specialists see how well different services perform, and the improvements to make. The information collected will help to find better ways of preventing and treating illnesses, and guide decisions about how to manage NHS resources.

Of course, the information shared will be regulated by law and very strict confidentiality rules, and if you don’t want your own information to be shared you can always opt-out. But this is where ethical questions surrounding Big Data projects and citizens’ medical records appear. The great amount of articles, blog posts and comments written about are the ultimate proof of the potential of Big Data to rouse public debate.

You can find more information about at:

And our first zone is… the ComputationalBio Zone!

In order to help you and your students familiarise with the topic of computational biology, we have prepared a brief summary on the subject. We have added links to websites in which you can find more detailed information, and we have also collected some educational resources that you might find of use to support your classes.

Human protein interaction network

Statistics, maths, biology, chemistry, physics, ecology, anatomy, neuroscience, animation… these are just some of the fields included in computational biology. Computational biologists spend their time developing new sophisticated tools to study different biological, behavioural and social systems.

Scientists started to use biological data to develop mathematical relations among a range of biological systems in the early 1970s. However, it was not until a decade after that they started sharing big amounts of data, which required the development of new computational methods. And since the late 1990s, computational biology has become an essential part of the latest biology advancements.

The fields inside the field

Computational biology is a very broad term that can be subdivided in several disciplines. Below we have summarised just some of these disciplines, and we have linked them to educational resources that  your students might find useful:

Genome Explorer, one of the tools available at

Genome Explorer, one of the tools available at

Computational genomics studies the genomes of different cells and organisms. The best example of this field is the Human Genome Project, which sequenced the entire human genome into a set of data that opened the possibility of personalised medicine. is the Wellcome Trust Sanger Institute’s educational website around this topic.

Computational pharmacology uses genomic data to find links between specific genotypes and diseases in order to assist the screening of drug data. Scientists and pharmaceutical companies are developing new computational methods that will help them compare chemical and genomic data related to the effectiveness of drugs.

Cell slider, a interactive tool to help scientist identify cancer cells

Cell slider, a interactive tool to help scientist identify cancer cells

Cancer computational biology is composed of several areas that include determining tumours’ characteristics and analysing molecules and genomic patterns that relate to the causation of cancer. Cell Slider is an interactive online site in which the public analyse real cancer data. By getting as many people as possible to participate, more samples will be analysed, leaving scientists with more free time to carry out other cancer research.

The tree of life, from the Natural History Museum

The tree of life, from the Natural History Museum

Computational evolutionary biology uses DNA data to track or even predict evolutionary changes of species over time, among other purposes. Computational evolutionary biology is often used to draw more accurate evolution trees. In this link from the Natural History Museum you will find a simple and interactive Tree of Life.

Eyewire, a game to help scientists understand brain connections.

Eyewire, a game to help scientists understand brain connections.

Computational neuroscience studies how electrical and chemical signals are used in the brain to represent and process information. Today, a large scientific research project called Human Brain Project, funded by the European Union aims to simulate the whole human brain on supercomputers in order to gain a better understanding of how it functions. Eyewire is a game in which players help scientist figure out how the brain is wired, starting from the nerves in the back of the eye.

Computational systems biology involves the use of computer simulations of biological systems – including cellular systems, multicellular organisms, ecological models or even models of infectious disease – in order to analyse and visualise the complex connections of the processes that take place within each system. Plague Inc. is a game that uses an epidemic model with a realistic set of variables to simulate the spread of a plague. Playing this game will help your students learn how computer models can be a useful tool to predict the outcomes of certain changes in a given ecosystem. They will also understand the basic concepts of epidemiology.

If you are aware of any other resources or useful information that we could link here, please contact us. We will be updating the site periodically to include up to date information and educational materials.

Contact email:

Behind the scenes on the Big Data debate kit

Today I went to Nottingham University’s Sutton Bonnington campus to talk to Prof Richard Emes, a bioinformatician. We sat in the coffee shop there and talked for two solid hours about Big Data, while my boyfriend took a walk in the Arboretum with our baby. (The campus is lovely, definitely worth a visit if you ever get the chance)

Not only was it a fascinating conversation, but just before we left they started putting reduced stickers on the food in the display cabinet. We got two sets of ‘all-day-breakfast’ sandwiches, and four packs of tropical fruit cocktail, all for 90p! So if you are visiting, my tip would be to make it a Friday afternoon…

I’ve been writing these debate kits for six years now (this is my seventh kit) so you’d think I’d have got the hang of it by now. But each one is a new challenge. The way I do it is find out as much as possible about the topic and the issues it raises. Then once I’ve got a feel for the topic, I hunt around for a question that a reasonable person could go either way on. Then I know I can make the characters believable and sympathetic.

I do quite a lot of reading on the topic, but there’s no substitute for talking to people who really know the area. Usually I only need to talk to a couple of people before I start hearing the same points raised again.

This was my fourth research interview so far for this kit and we covered completely new ground. Big Data really is a big topic, and I think for this one I’m going to have to do a lot more research before I’ve scoped it out.

I asked Prof Emes what his ‘Big Data in a nutshell’ definition would be, and he said, “When the data you are producing is too big – there’s too much of it – for you to handle it alone.” By that definition, Big Data has been around a long time, certainly since the Human Genome Project (the first draft of the human genome was published in 2000).

Prof Emes said he used to think people were young if they were born since Live Aid. Now, there are students in Year 9 who were born since the human genome was sequenced.

Some of the issues raised so far:-

How are we going to store the massive amounts of data involved? And where will we find enough biologists with the computer skills to manage it?

How do we even work with this level of data? For example, as Julia from the Wellcome Trust Sanger Institute pointed out, one day our genomes will all be easily sequenced. But that’s thousands of genes. How will your doctor look at your test results and take in the relevant information? It’s not like casting their eye down a list of blood test results.

There are all sorts of privacy issues. For a start, genome data can’t truly be anonymised, because once you have someone’s genome, you can infer facts about them.

Also, you share a great deal of your genome with your close relatives. What happens if I want to have my genome sequenced, and find out what ‘dodgy’ genes I have? But my sister doesn’t want to have hers sequenced, and would rather not know. She probably can’t help finding out what I’ve found out from mine. So the privacy of her genome isn’t hers to control any more.

Also, who owns your genetic sequence? Who has the right to publish it? What if your parents got you sequenced as a child, and then someone developed a drug based on your DNA. Who owns that? You can’t take the knowledge back.

Genome sequencing is now so quick and cheap that you can sequence an organism in a couple of weeks. (The Human Genome Project took over ten years). What if we all got sequenced at birth? Could you then be refused a job in the police, because you had genes associated with aggression? Or automatically put into the bottom set at school because you had genes associated with having difficulty reading?

These are just a FEW of the questions I’ve been asking this week. And so far I’ve only really been talking about genetics. I still need to talk to people about Big Data and neuroscience.

It’s going to be hard pulling it all together into a kit. And of course, one kit won’t be able to cover every issue I’ve been hearing about. But I’m feeling quite excited that it’s a topic with so many sides to it. I think this is going to be a great kit.

Today saw the House of Commons Select Committee grill various witnesses about and how medical records appear to have been wrongly given to the insurance industry.

Watching the session and the #CareData twitter stream was depressing. There seems to be a culture that the NHS needs to sell the idea better, that it needs to articulate what it is doing better. The idea of listening to people’s concerns did not feature.

As part of our Big Data Season we’d like to run an I’m a Scientist public Zone (along the lines of our GM Food Zone) on Care.Data. We’d like to give the public a chance to ask 5 experts about the implications of, to give NHS England a chance to explain their system, for medConfidential to explain their concerns, for Ben Goldacre to come back to supporting a project that if properly implemented would improve healthcare enormously. But mostly a chance for the public to ask their questions and for the epxerts to hear those views.

So who should be the five experts? Let us know in the comments or via email

Take part in the ComputationalBio Zone this March

I’m a Scientist, Get me out of here! will be running a series of online events around Big Data. The Big Data Season 2014 will open with the ComputationalBio Zone in March, and we’re looking for scientists to take part.

Do you design the tools used in computational biology? Are you a geneticist, either clinical or academic? Or a statistician? Do you work on policy or ethics about big data and how it is shared? We’re looking for five scientists and experts to take part in I’m a Scientist, Get me out of here!, a two-week long science engagement event that gets scientists interacting with school students online. The ComputationalBio Zone will run between the 10th and 21st March, exploring how computer systems are used to solve complex biological problems.

Five scientists will take part in live chats, answer questions, and show students that there is so much more to biology than dissections and microscopes. Whether you’re a practised communicator or a relative novice, I’m a Scientist is a fantastic opportunity to hone your skills. You’ll get to engage with an audiences who might be typically hard to reach, and learn what today’s students think of science and scientists. The students will then vote for the scientist they like the best to win £500 to spend on their own public engagement project.

For more information on I’m a Scientist, Get me out of here!, follow this link:, or email Josh at

If you are interested in taking part, sign up here: by Sunday 16th February 2014.

Big Data

Science used to be so simple.

Gene expression analysis

Physics involved dropping lead weights, and swinging pendulums. Chemistry meant mixing two liquids and measuring the heat rise or change in colour. Biology was about identifying leaves and insects.

But that has all changed.

Physicists now use the Large Hadron Collider at CERN to generate vast quantities of data to model how the universe is constructed. Chemists study molecular structures through crystallography through complex computer transformations. But perhaps the biggest change has come in biology.

B0002672 Automated DNA sequencing output - HGP
Automated DNA sequencing output HGP

Geneticists are working out the details of the building blocks of life through sequencing genes. Epidemiologists are working out how disease spreads using computer models of millions of people. Neuroscientists are embarking on projects to recreate the brain using computer networks.

Computers and the Big Data they generate are radically changing science.

The I’m a Scientist Big Data season in 2014 will explore how computers are used in real science today. We’ll look at the science and scientists at the cutting edge of Big Data and we’ll explore some of the issues that Big Data presents to society.

This year we will run 6 zones for school students to talk with scientists to see how bioscience is done with computers in 2014. We’ll create a debate kit on the wider issues that the data collected by science has on society and through both online and live events we’ll give the general public the chance to question scientists, ethicists and policy-makers about the modern methods of bioscience.

The Big Data Season is being supported by BBSRCTGACWellcome TrustSTFC, Marie Curie Fellowships, and we are looking for more funders. Please contact Shane McCracken via email or phone on 01225 326892.

If you are a researcher who would like to take part in this season of events please sign up.