Monday, 12 November 2012

Alfred Henry Allen, an extraordinary chemist

My parents have just written a fantastic paper about A. H. Allen, Sheffield's first Public Analyst. Alfred Henry Allen lived from 1846 to 1904, in a time of gas lamps, horse drawn carriages and dubious Victorian era water quality. His chemical investigations and new methods of analysis shone a light on the practices of careless or unscrupulous food and drink manufacturers.

The paper describes lots of his achievements. For example he investigated why the drinking water in some areas of Sheffield contained harmful lead, which was poisoning the population, while other areas had lead-free water (it turns out that the leaded water came from reservoirs which were found to be acidic, and the acid dissolved some of the lead from pipework, so he proposed that the water be treated with lime and limestone to remove the acidity).

Allen investigated the proportions and effects of the different alcohols found in whisky, and even did some testing on himself, drinking a wine glass full of whisky every evening for 3 weeks in order to show that amyl alcohols did no harm. You'll also want to read about his concerns about the "slovenly and ignorant" production of cider, where the cider makers reused manure carts to carry apples.

He didn't just do the science, but also communicated it to a wider audience. He gave public talks and "entertainments", such as Alchemy and the Alchemist, Chemistry of ExplosivesVisible Sound, and Artificial Light. The paper points out that "People in Victorian times had a thirst for entertainment of a scientific and paranormal nature", and I imagine his lectures would have been very popular. Would his talk on Chemistry of Explosives have featured exciting demonstrations that would be impossible today for health and safety reasons? What magic would the lecture on Alchemy have shown?

Allen published over 150 papers and many books on methods of chemical analysis. He wrote 13 volumes of a book called "Commercial Organic Analysis", which needed continual updating as science progressed. He was a founder member of the Society of Public Analysts. He died of diabetes, a disease that had no cure or effective treatment at the time (though Allen had published papers, a book and chemical analysis methods for the determination of sugars in urine, so he would have been well able to measure his condition). During his life, he developed a business of consulting chemists (A.H. Allen & Partners), at which, after many years, my parents worked, and they have now honoured him by documenting his place in history.

Friday, 17 August 2012

Turing sunflowers are beginning to flower

0 has opened her flower. 1 broke in the wind and had to be replaced, but now she's catching up.

And 0 and 1 now have a sister sunflower: 10, who is leaning a little, losing her petals, and starting to set seeds.

Looking forward to counting!


Science galleries

There is a growing awareness that scientists and engineers should be better at communicating their work to the public (and to policy makers). It's not enough just to do good science if no one knows about it.

The excellent Sixty Symbols project has scientists at Nottingham Uni making videos about interesting aspects of physics and astronomy. There are Science Cafes (or Cafe Scientifiques) around the country to chat about science in a mixed audience. As part of our current research project we're going to make an online bioinformatics activity that can be used in schools to teach children about binary numbers, rules and about how genome translates into phenotype. BCS Mid-Wales have recently had two successful show-and-tell events where robots, games, and technology were proudly shown and enjoyed. And we've just had a student develop a fantastic HTML5/Javascript game to demonstrate evolution in partially selfing fish populations (you have to breed a bigger population than your competitors and not get infected).

In every reasonably-sized town there will an art gallery (where the public can go to see art), a museum (where the public can learn about history), and a library (where the public can go to find literature). is there an equivalent for science? There are science museums/education centres in certain towns, but they aren't as widespread as art galleries/museums/libraries by a long way. And they're often aimed at educating children and teenagers rather than aiming at inspiring adults. Art galleries are rarely aimed at children and teenagers.

Here in Aberystwyth we have an Arts Centre with multiple galleries where I can see stunning photography, paintings and prints, Ceredigion museum which shows me what amazing outfits people used to wear when bathing, and how huge telephone cables used to be, and we have more libraries than most towns, including the university libraries, a town library and the National Library of Wales. But we don't have a place I can go on a rainy Saturday afternoon to browse science. I can read about science in the library, but that's not the same thing. I'd like to see Science Galleries in every town.

Tuesday, 26 June 2012

Day 1 of The Alan Turing Centenary Conference in Manchester

I've been attending the Alan Turing Centenary Conference in Manchester. The conference was in the huge and beautiful Manchester Town Hall, and it feels very appropriate to hear these talks in Manchester where Turing worked. The conference was over a weekend, including Saturday 23rd June, which was Turing's birthday 100 years ago. There were too many talks to write about all of them, but here are a few comments on a few of the talks from Day 1.

Vint Cerf is Chief Internet Evangelist for Google, one of the creators of the internet and of TCP/IP. He described the early days of the internet, how they had to make decisions about the size of the address space they would need in order to address each of the devices connected to the internet. Imagine their problem in the 70s, trying to work out how many computers there would be in the future. Now there are about 2-3 billion internet devices connected at any one time, and a load more devices only intermittently connected. Everything is now becoming networked, including Cerf's wine cellar, which of course has a sensor network measuring temperature, humidity and light conditions, and will send an alert to his mobile phone if his wine becomes too warm. See Vint Cerf's talk.

Yuri Matiyasevich talked about number theory, which was yet another of Turing's many fields of work. Turing wrote a few papers in number theory, looking at the Riemann hypothesis. During telling us about the Riemann hypothesis, Matiyasevich told us about Skewes number, "the largest number that served any definite purpose in mathematics" according to Hardy, which was a massive lower bound of 10^10^10^34. In 1939, Turing applied for a grant from the Royal Society for just £40 to calculate information about zeros in the Riemann zeta function. This rather small amount of money was to buy parts in order to adapt some machines for predicting tides to the purpose he had in mind. Matiyasevich showed a picture of a tide prediction machine, and it instantly gives an impression of the "technology" that Turing was working with at the time. Apparently, Turing was fine with the fact that his machine gave its output in base 32, stating that he was "sufficiently familiar" with reading numbers in this form that it wasn't a problem. See Yuri Matiyasevich's talk.

David Ferrucci is the main man behind the Watson system that beat humans at the American quiz game 'Jeopardy' talked about the creation of Watson. Although Jeopardy isn't a Turing test, it is a test of general knowledge, and in this case a machine was competing against humans. His team of about 20 people put together many components (parsers, entity and relation extraction, knowledge base and search techniques etc) and then tested the system as a whole as the primary metric, not any one component. Initially it took 2 hours to answer a question, and some serious parallelisation of tasks was needed to make it fast enough for the game show. He gave lots of examples of questions and answers in his talk, including a question that asked for the common theme between "shirt", "telephone" and "TV remote control", to which the answer was "buttons", though "made with petrochemicals" and "made by humans" were also valid answers suggested by Watson. See David Ferrucci's talk.

Fred Brooks talked about Turing's Pilot ACE, a prototype of the ACE computer that Turing designed and wanted to build. Why did Turing's computer not take off? He suggested 3 main reasons: that he didn't publish his 1945 proposal, the machine was too late by the time it finally got built, and he didn't make it easy to program (didn't foresee how important that would be). Again we get the impression of Turing's amazing ability with numbers. Using backwards binary with the lower order bits on the left was no problem to him, but was not so easy for others. ACE also had other peculiarities: no accumulator, no general multiply, no straightforward conditional branch. Brooks recommended a new book by Simon Lavington, "Alan Turing and his contemporaries: Building the world's first computer", and he recommended everyone read von Neumann's EDVAC report on a much more practical machine. You can now see the Pilot ACE in the Science Museum in London. See Fred Brooks' talk.

The conference even after day 1 was already giving a wonderful picture of how diverse Turing's contributions to computer science and mathematics were, and how far reaching his ideas would be.

See also my writeup of Day 2 and Day 3 of the conference, and the online video recordings of the talks.

Day 2 of the Alan Turing Centenary Conference in Manchester

On the second day of the Turing Centenary Conference in Manchester there were more great talks. The screen showing the presentations was flanked by a large image of Alan Turing, reclining in a chair, reading a book. It was rather like having him present, listening to the talks.

Tony Hoare asked the question "Can computers understand their own programs?" and proceeded to propose to answer it with an alternative to the Turing test. In the same way that "Can a machine be intelligent" cannot be answered without a definition of "intelligent", Hoare's question cannot be answered without a definition of "understand". Instead, if a computer can answer questions posed to it about its own program, and give correct and useful answers, and explain its reasoning, then we may conclude that it understands its program. These questions might be, for example: "Will this change make the program run more slowly". He also finished his talk with a hint at his current work on algebras for programming, by asking whether programming will become part of recognised formal logic in the future. See Tony Hoare's talk.

Manuela Veloso is a robotics researcher who had 116 robots in her lab at a count in 2011. Her talk gave a tour of her work. This included planning and replanning in domains like robot football, purposeful perception (we only see/perceive what we need to deal with the task in hand, and if you arrive at an airport, you won't notice what colour the seats in the lounge are, because you're busy looking for a sign saying "Exit" or "Baggage"), robots that cohabit with humans and ask humans for help, and discussions about whether both robots and humans now use off-board AI. If Veloso and her robots had come to my school when I was young,  I'm sure I would have wanted to be a roboticist immediately. In one visit to a school, a 6-year old asked "Did the robots wonder why you picked them up?"

Donald Knuth gave a pre-dinner speech. This was actually just a short speech followed by question-and-answer session. He said that he was proud to be a geek and he believed that Alan Turing was 100% geek. He handled a wide variety of questions, such as "What advice would you give to a young researcher" (follow your muse), "Are your children and grandchildren programmers?" (my son is a maths teacher, my grandchildren are too young yet to know), "Are objects the same as co-routines (that question was from Bertrand Meyer, the creator of the Eiffel programming language, and the answer was an involved characterisation of their similarities and differences), "Whats your next book" (my life's work is the Art of Computer Programming, I'm working on the new volume, it's slow progress because I want it to be right).

During his short speech, Knuth said that he'd been on Friday before the conference to see the Alan Turing statue in Sackville Gardens, near Manchester Piccadilly. He'd sat there on the bench with Turing's statue, and he and Turing both sat there and thought for about 10 minutes, and he found it wonderful, and he recommended it to us all. When I left the conference on Monday lunchtime, I went via Turing's statue and did the same. The statue is life-size and feels strangely human, especially when you sit on the bench next to him. I'll never get to meet Turing, but I think that attending a 3 day conference all about his work, followed by sitting next to his life-size statue on a bench is probably the next best thing.

See also my writeup of Day 1 and Day 3 of the conference, and the online video recordings of the talks.

Day 3 and thoughts on the Alan Turing Centenary Conference in Manchester

On day 3 of the Alan Turing Centenary Conference in Manchester I had to leave after lunch and so I missed the afternoon sessions. However, the morning session was excellent.

Garry Kasparov: The world chess champion took us through the story of Turing's world-first chess playing program. Turing didn't have a computer to execute the algorithm, so he carried out the calculations by hand, taking 15 minutes to execute a move. Turing's algorithm was pitted against humans, and one of the matches was transcribed. Kasparov and colleagues collected the rules of Turing's chess playing program and implemented it. When they tried to reproduce the game, the results deviated from the game transcription 10 times. Presumably, Turing got bored in his hand-calulations, assumed what the answer would be, and just moved the piece. Live at the conference, Gary Kasparov played a game of chess against the implementation of Turing's program, and Kasparov very quickly won in 16 moves, with a commentary as he did so (pointing out moves that were "not the greatest choice!"). See Gary Kasparov's talk.

Rodney Brooks: The roboticist talked about Turing's unpublished 1948 paper "Intelligent Machinery", which discussed embodied robots. They were impractical in Turing's lifetime. In this paper Turing also says that machines that never make a mistakes could not be intelligent, that humans are not machines because they seem to be able to do mathematics despite Godel's famous theorem, and discusses the role of neurons and brains and their resemblance to a universal machine. Brooks showed how the field of robotics had benefitted from the exponential laws about processor speed and availability of sensors, and now there are many things we can do in robotics that Turing would not have thought possible back in 1948. In fact many things that look intelligent when humans do them, turn out to be the result of relatively simple laws, and his group's research can make robots look human-like just by taking care of how they move their gaze to track an object, or how they use prosody and movement to look as if they have emotions. See Rodney Brooks' talk.

Throughout this conference we were reminded of the huge range of topics that Turing worked on in his lifetime, and that this conference covered:
  • The Turing test (or "Imitation game") was repeatedly mentioned in several contexts by several people, including for its primary use - a discussion of whether AI is achievable and how we will know when we've made it.
  • Michael Rabin reminded us of Turing's contribution to decidability, the halting problem and computability
  • Edmund Clarke reminded us that Turing was the first to realise that software verification would be important.
  • Yuri Matiyasevich reminded us of the work Turing did in number theory and that his work is still used in computing zeros for the Reimann zeta function.
  • Adi Shamir reminded us of Turing's work in encryption.
  • Fred Brooks reminded us of Turing's efforts to create early computers (and his Turing machine) and his thoughts about whether intelligent machines were possible.
  • Leslie Valiant and others pointed out that Turing had categorised 3 types of search: intellectual search (search algorithms taught now in all AI courses), genetical search (genetic algorithms, evolution) and cultural search (gaining knowledge as a community by interaction).
  • Garry Kasparov reminded us that Turing made the first automatic chess playing program.
  • Rodney Brooks told us that Turing wrote about the possibilities and problems for embodied robots long before they could be built.
  • Hans Meinhardt told us about Turing's work in understanding the formation of biological structure.
Happy Birthday Alan Turing! Where will computer science be in another 100 years?

See also my writeup of Day 1 and Day 2 of the conference, and the online video recordings of the talks.

Wednesday, 6 June 2012

Alan Turing, the first bioinformatician

This is the Alan Turing centenary year, and Alan Turing would have been 100 years old this month (on 23rd June) if he had lived this long. As well as inventing computers, theories of decidability, computability, computational cryptography and artificial intelligence, just before his death he also studied the relationship between mathematics and the shapes and structures found in biology. How do patterns in plants, such as the spiral packing of the seeds found in the head of a sunflower, come about? This year, in a big experiment, devised by Prof Jonathan Swinton to celebrate his centenary, sunflowers are being grown across the country. The seed heads will be collected, their patterns counted, then hopefully the results will demonstrate the relationship between Fibonacci numbers and biological growth that Turing was investigating. We're growing two sunflowers here in the Dept of Computer Science at Aberystwyth University as part of this experiment. Their names were voted on by the Department, and "one" and "zero" were chosen.

Turing used an early computer at Manchester (the Ferranti Mark 1, the first commercially available general purpose electronic computer) to model the chemical processes of reaction and diffusion, which could give rise to patterns such as spots and stripes. You can play with a Turing reaction-diffusion applet online, which shows how changes to the diffusion equation parameters produce different patterns. Turing wrote, near the end of his 1952 paper The Chemical Basis of Morphogenesis that:
"Most of an organism, most of the time, is developing from one pattern into another, rather than from homogeneity into a pattern. One would like to be able to follow this more general process mathematically also. The difficulties are, however, such that one cannot hope to have any embracing theory of such processes, beyond the statement of the equations. It might be possible, however, to treat a few particular cases in detail with the aid of a digital computer."

He then goes on to elaborate further on how computers have already been extremely useful to him in helping him to understand his models (no need to make so many simplifying assumptions). The fact that he actually used computers to investigate the models underlying biology, makes him the first bioinformatician / computational biologist. The fact that he could see the future, and could see how computers would enable us to model and explore the natural sciences makes him an amazingly visionary scientist.

Extra reading:

Monday, 16 April 2012

Ada Lovelace Colloquium 2012

On 12th April, I went to the BCS Women Ada Lovelace Colloquium for the first time, which is a yearly national event for women undergraduate and masters students in computing. This year's event was held at the University of Bath. Before I went, I'd chatted to one of our male members of staff. I described where I was going and he'd said "Isn't that a bit sexist?" If only! If only it were a level playing field for women in computer science. Most undergrad compsci courses in the UK run at about 10% females, 90% males. If I go to a compsci conference my gender will be hugely in the minority, at or below 10%. At this year's Turing Centenary Conference only 1 out of the 19 invited speakers will be women. The Lovelace Colloquium reverses that gender ratio - there were men in attendance, and they were welcomed, but they were the ones in the minority. And all the speakers were female.

But other than the gender ratio, it looked and felt just like any other academic conference and the standard of work was very high. There were a mix of motivational talks and technical talks, and an excellent poster session. I was one of the judges for the masters poster competition and it was a really difficult decision. Each of the poster presenters was able to tell us in enthusiastic detail about their work and ideas, and several of them had brought along demos. We eventually awarded that prize to a poster from Wuraola Jinadu from Robert Gordon Uni in Aberdeen who was creating an iPad app for designing UML diagrams.

Saturday, 3 March 2012

De Bruijn graphs

De Bruijn graphs are currently a cornerstone of several genome sequence assembly algorithms. Nicolaas de Bruijn was a Dutch mathematician who died this year in February 2012. In 1946, he created the idea of the graph that is now named after him. The picture of him here is Copyright:MFO, and others can be seen at the Oberwolfach photo collection. I like to think he might be drawing graphs while resting on this bench. It looks like there's a football beside him, though he's hardly dressed for a game of football, in a tweed suit and tie.

The nodes of de Bruijn graphs are short sequences. There is an edge between two nodes if you could convert one sequence into another by shifting it along by one character and adding a new character at the end. So the nodes "AACAA" and "ACAAB" are allowed to be joined by an edge. In Haskell, you might say that tail node1 == init node2. In Python you might say node1[1:] == node2[:-1]. A very readable article in Nature Biotechnology describes how these graphs are used in sequence assembly. Basically, a subsequence size is chosen (for example, 32 characters), and all subsequences of this size that are found in your read fragment data then become nodes in a de Bruijn graph. We search for a good path through this graph after cleaning up some of the noise and errors and deciding what to do about loops. The path through the graph will tell you the final sequence.

At the moment, sequence assembly is still a big problem. A new genome is sequenced as a collection of short overlapping sequence pieces (short is usually anywhere between 32 and 500 bases at the moment), which then have to be painstakingly pieced together. There are errors in the sequencing, and repetitive regions, and this complicates the problem. The size of the data is also a problem. Plant genomes can have around 20 billion bases, so working with files and having enough memory to store data structures adds to the problem.

So much must have changed in de Bruijn's lifetime. Back in 1946 the structure of DNA was not clear. Watson and Crick's paper in 1953 was yet to come. Computers were yet to come. He wouldn't have known that more than 60 years later we'd be using his graphs to assemble whole genomes that were sequenced using the detection of fluoresence. What will we be doing 60 years from now that links maths, computer science, physics, chemistry and biology?

Wednesday, 25 January 2012

We need more than one programming language

I teach Haskell as a programming language to our undergraduates. I'm sure it's a continual subject of debate in computer science department coffee rooms up and down the country: "Which programming languages should we teach in our CS degrees?" The module that I teach is called "Concepts in Programming", and the idea is that there are indeed concepts in programming that are fundamental to many languages, that you can articulate the differences between programming languages and that these differences give each language different characteristics. Differences such as the type system, the order of evaluation, the options for abstraction, the separation of data and functions.

"We need more than one" is the title of a paper by Kathleen Fisher, a Professor in Computer Science at Tufts University. Her short, eloquent paper describes why we will never have just one programming language ("because a single language cannot be well-suited to all programming tasks"). She has had a career spanning industry and academia, has been the chair of the top programming language conferences (OOPSLA, ICFP), and has been the chair of the ACM Special Interest Group on Programming Languages. Her paper On the Relationship between Classes, Objects and Data Abstraction tells you everything you ever needed to know about objects. She knows about programming.

Her recent work has been looking at that problem of how to read in data files when the data is in some ad-hoc non-standard representation (i.e. not XML with a schema, or CSV or anything obvious). So we all have to write parsers/data structures to fit these data files. We write a program in a language such as Perl. She says "The program itself is often unreadable by anyone other than the original authors (and usually not even them in a month or two)". I've been there and done that.

And when we've written another new parser like this for the umpteenth time (especially in the field of bioinformatics) we begin to wonder "How many programs like this will I have to write?" and "Are they all the same in the end?" and Kathleen's paper The Next 700 Data Description Languages looks at just that question. What is the family of languages for processing data and what properties do they have? I love the title of this paper because it instantly intrigues by its homage to the classic 1966 paper The Next 700 Programming Languages by Peter Landin. (To see why Peter Landin's work on programming languages was so important, the in memorium speech for Peter Landin given at ICFP 2009 is well worth a listen, and also the last 3 mins of Rod Burstall's speech discusses this paper in particular).

So perhaps, in computer science, we're doomed to keep inventing new programming languages, which wax and wane in popularity over time: Lisp, Fortran, C, C++, Java, Ruby, Haskell, F#, Javascript and so on. But as computer scientists we should be able to understand why this happens. They're all necessary, all useful and all part of a bigger theoretical picture. We need more than one.

This is my entry for the BCSWomen Blog Carnival.

Friday, 13 January 2012

Artificial Intelligence and Microscopes

Artificial Intelligence has always been a branch of Computer Science that really catches the imagination of both scientists and the public. Trying to understand and replicate intelligence in all its different forms (reasoning, creativity, decision making, planning, language, etc) is exciting because it helps us to understand ourselves. Computer scientists such as Alan Turing have been pondering the implications and possibilities of AI since the 1940s and 50s. In 1951, Marvin Minsky built the first randomly wired neural network learning machine. He had studied mathematics and biology and was trying to understand brains. He's now famous for his work in AI, but, back in the 1950s, he wasn't just a mathematician, or just a computer scientist, but also studied optics, psychology, neurophysiology, mechanics and other subjects. Perhaps we pigeonholed people less into disciplines back then? Or maybe he was just amazing. Armed with all this knowledge, and a desire to learn about the brain and to look at neurons, he invented a new type of microscope, the confocal microscope. This gets rid of unwanted scattered light so that he could really focus in detail on a very specific part of the item he was looking at. Now he could see things that had never been seen before. He built the first one, and then patented this microscope in 1961. It would be another 20 years before the idea caught on (what would the research impact monitoring committees of today make of that?). Confocal microscopes are now in every biological lab and are taken for granted.

C. elegans is a 1mm long worm which lives in the soil. It is a very simple creature, easy to grow in the lab and it has a brain. Sydney Brenner (who is 85 years old today, 13th Jan 2012) has a Nobel Prize for introducing C. elegans to biologists as a "model organism": an ideal organism for studying the principles of life. In 1986, John White and his colleagues Southgate, Thomson and Brenner published a paper on the structure of the brain of C. elegans. Each worm has just 302 neurons and this number is the same for any C. elegans worm. They worked out where all the neurons were and what their connections to other neurons were, using a confocal microscope. John White had to make substantial improvements to Minsky's microscope design in order to do this. They took 8000 pictures ("prints", because it wouldn't have been digital back then) with the microscope and annotated them all by hand.

So we now have a complete picture of a simple brain. Other scientists have taken the data from White et al.'s work and created models of the brain. We understand a lot about the behaviour of the worm and which of its 302 neurons are responsible for which behviours. We have the entire C. elegans genome, so we know how many genes it has (approx 20,000), how many cells it has (approximately 1000), and we have a technique (RNA interference) for surpressing the behaviour of any gene we want to investigate. Are we nearly there yet? Are we at that tipping point where we've inspected all there is to inspect and found nothing except complexity? Have we already understood intelligence?

Further reading/viewing: