Putting together the pig Y chromosome

Sex chromosomes

Of all the chromosomes in a genome, the sex chromosomes are among the most interesting (for organisms that have sex chromosomes). Our sex chromosome pair, X and Y evolved from a pair of regular chromosomes about 166 million years ago, and are found in two of the three major mammalian groups – the eutherian (placental) mammals and the marsupial mammals. Monotreme mammals, such as the platypus, have a completely different (and extremely cool) set of sex chromosomes, which are more similar to bird sex chromosomes than other mammalian sex chromosomes.

Continue reading

Dinosaur genomes tasted like chicken

Ancestral bird chromosomes look a lot like chicken chromosomes

A suite of really exciting papers have just come out on avian genomics, describing the sequencing of 48 bird genomes. That’s big news – it gives us an annotated bird genome from nearly all the described major groups of birds, and I’m looking forward to ploughing face first into the data. One of the papers among them comes from an ongoing collaboration I have in the bird genomics field [1]. That in turn extends on a paper that came out earlier this year [2], and which I had not got around to talking about yet, so I’ll treat them together. These pieces of work have been looking across bird species at patterns of chromosomal rearrangements: inversions, translocations, fissions and fusions. This is when parts of the chromosomes break apart, and rejoin in a different order. When we line up chromosomes from different species, we see places where the order of the DNA sequences differs between the them – see for example the picture below, which compares a chromosome from chicken to a chromosome from turkey, in (A) drawing a line between the positions that look like they match.

Continue reading

Mapping chromosome positions with ‘sperm cartography’

This post expands on a poster I presented at the International Chromosome Conference (ICC20) in Canterbury; download the pdf here.

The term ‘nuclear organisation’ refers (in genetics) to the positions that chromosomes adopt in the nucleus of a non-dividing cell. We’ve known for a long time now that chromosome position is often not random, and that it can differ between species, cell types, and even disease states.

Upper: Chromosome territories in green against blue nuclear DNA. Lower: 5-ring shell fitted to the nucleus allows determination of chromosome distributions. Adapted from figure 2 of [1].

Upper: Chromosome territories in green against blue nuclear DNA. Lower: 5-ring shell fitted to the nucleus allows determination of chromosome distributions. Adapted from figure 2 of [1].

Continue reading

Copy number variation in birds: another source of genetic variation to explore

Red Junglefowl

Male Red Junglefowl (Gallus gallus) – the species from which modern chickens were domesticated. Image CC Lip Kee.

I and my collaborators in Darren Griffin’s laboratory recently published a paper in the bird genomics field, looking at a type of genetic variant called copy number variants (CNVs) across a range of bird species (the post-print is also available below [1]).

What we have found is that contrary to our initial expectations, bird genomes appear to have equally high numbers of CNVs as mammals. The CNVs are predominantly associated with genes, and their locations within the genome suggest well-described processes such as non-allelic recombination are generating this variation.

Continue reading

Open access and open data

How we can share our research with the world (and why we should bother)

Copyright Azcolvin429

The open access idea:
All public scientific efforts should be freely available globally.

Today I gave a lunchtime seminar on the topic of Open Access and Open Data. This is something I’ve been getting increasingly interested in over the last few years, and two events coincided to persuade me to talk about it to the other members of the department. Firstly, I went to an excellent talk by Jelena Aleksic, covering much of the same ground, at a recent Cambridge Open Research group meeting. Secondly, I was handed the keys to my divisional seminar scheduling list in the Department of Pathology, and left to go wild.  I decided to abuse my power, and give myself a slot to talk about something a bit out of the ordinary but that I think is very important. The following post is basically a summary of the talk I gave (the slides are available from SlideShare).

Why does open access matter?

Traditional publishing has followed a simple model: scientists write papers, and send them to a journal. Other scientists review the paper, the journal adds some formatting, then sells subscriptions so that other people can read the paper. This model has a problem: it restricts access to results to those who can afford to pay the (very hefty) subscription fees, which impacts on not just universities, but also on businesses, interested lay people, and the public at large. Anyone with an interest in governments producing evidence-based policies will see the problem with their representatives not being able to actually see the evidence.

Further, there is the issue of machine readability and text mining. There are more than 1 million papers being published every year in biosciences alone; that represents a huge torrent of information that, ideally, we would pass on to computers to identify genes, compounds, diseases and make links between fields that may otherwise be missed. Mostly we can’t do this – not for technical reasons, but because traditional publishers don’t allow it.

What are the open access models?

The two main models for open access I described are the gold and green models. Gold access is basically a rearrangement of the traditional model: instead of paying to read a paper, you pay to publish a paper. This means that anyone anywhere can read your research. The downside is that it costs a lot (often >£1000) to publish, and again this restricts scientific publishing to the wealthy. Some open access journals do scale their fees depending on where the submitter is based, but it remains an exclusive means of publishing.

A more appealing option is the green model. In this case, publishing is free and the responsibility for sharing the research falls on the submitter; they are given permission by the publisher to post their pre-print or post-print to a repository, or personal website (and in this case, a scientific networking site like ResearchGate or Academia.edu is considered a personal site). All they cannot share are the final formatted pdfs, and well-off institutions can maintain subscriptions to receive access to these better formatted versions.

While green access gets around many of the issues of gold access, it still has some problems of its own; it relies on author motivation, and some authors have a reluctance to use a pre-print server. Fields like physics and maths have a far greater usage of such pre-print archiving than biology (though this may change since the introduction of the bioRxiv pre-print server last year).

How can we determine what the options are for a journal, and how can know if our funding bodies will permit us to publish in a given journal? The SHERPA/RoMEO service is maintained by the University of Nottingham, and tracks the open access policies of many journals plus how they comply with different funding body requirements. It lists whether pre-prints, post-prints or publisher versions of a paper can be self-archived, and saves a lot of time from digging around the depths of journal websites.

What other platforms are there to enable data sharing?

Some data types have well-established repositories. Microarray data is widely stored in GEO or ArrayExpress and annotated such that anyone can re-analyse the data and check the authors’ conclusions. Other forms of data do not have these resources, but would be useful if made available. For example, if a phylogenetic tree is given as a figure in a paper, then the original tree file in Newick format or similar would make subsequent analysis easier. For this type of data sharing, generic repositories like Figshare and DataDryad are useful. They are not always free, but many partner with journals to provide storage space when a paper is accepted.

How to promote open access?

Levels of open access by year from Gargouri et al 2012. Data is pooled from Figures 1 and 2

Levels of open access by year, redrawn from [1]. Data is pooled from figures 1 and 2

At the moment, the levels of open access publication are quite low; the figure shows some data [1] showing that since the late 90s there has been little increase in open access publishing in the bioscience areas. Some fields like physics and maths are higher, but still not much over 30%. Many are indeed lower. Effecting change here requires both top-down and bottom-up approaches, and fortunately the main funding bodies in the UK agree.

The Research Councils UK (RCUK) include the BBSRC, MRC and EPSRC, and they provide much of the research funding to universities. They have mandated that, since April 2013, any published research they funded must be made open access no later than six months after publication. The HEFCE (Higher Education Funding Council for England) has gone a step further. They administer the Research Excellence Framework (REF), the assessment of universities that, in part, determines how much government funding the university will receive. Each university department puts forward their best candidate papers for consideration in the REF; the HEFCE has recommended in a recent consultation that post-2014, a paper must be deposited immediately in an institutional repository upon acceptance to be admissible.

Such top-down approaches do work. A comparison of individual institutions that mandate open access publication versus those that don’t found that on average, mandating open access led to three times greater open publishing amongst their staff (60% vs 20%).

Other approaches to publishing

Not all open access publishing falls cleanly into the green or gold categories; PLoS has a spectrum of criteria to consider on open access, from reader rights, to author rights, to the machine readability of the data in the papers. They also publish the journal PLoS One, which reviews only for technical correctness – not importance of the research. As a result, they now publish more than 20,000 papers per year. PeerJ is a journal that uses a one-off lifetime subscription for authors – after this, an author can publish in PeerJ as many times as they like for no further charge. The Faculty of 1000 is pushing even further for openness: they keep the peer review process open as well. A paper is published immediately upon submission, after a technical check by the editors. The review comments, and the subsequent revisions to the manuscript are open for all to see.

Conclusion

Open access is important for everyone, not just scientists. It has taken a long time to get even this meager level of OA publishing, and we need to keep pushing onwards. The top-down support for open access in the UK (and other countries) is helping the process immensely, as is the growing recognition of the problem amongst scientists. Inertia can take a lot of effort to overcome, but we are getting there.

[1] Gargouri Y, Lariviere V, Gingras Y, Carr L, Harnad S. 2012. Green and Gold Open Access percentages and growth, by discipline. http://eprints.soton.ac.uk/340294

Cambridge Open Research: Nowomics – a personal feed for biological data

One of the difficulties of modern science is the continuous torrent of data we have to manage. There are about 20,000 new abstracts indexed every week now, and about 1500 biological databases storing a huge range of information [1]. How then, can we keep track of this?

Richard Smith, the founder of Nowomics, came to talk to the Cambridge Open Research group last Monday about this problem, and how his tool Nowomics may help. Nowomics is a website on which one can follow a topic of interest. The ‘topic’ could be, for example, a gene or gene ontology term. After following the topic, the website will show in your feed all new information relating to it. This includes new publications, experimental data sets and new annotations in public databases. It handles orthologue tracking, and synonym tracking, which makes it easier to keep up with genes that change names. Newly posted abstracts are text-mined, to minimise the delay between publication and database updates.

Currently four species are supported: human, mouse, rat and fruitfly. However, there are plans to introduce more, including Arabidopsis in the near future, with the potential for others further on.

This is a commercial effort, currently free to all. Eventually, Richard plans to fund it by, for example, including relevant antibody listings within the feeds or selling subscriptions to the service for commercial users, while keeping it free for academic use.

There is a lot of potential here to keep scientists up to date on their field of interest, especially compared with weekly emails about new papers in a field from Pubmed – after a holiday it is easy to overlook an interesting paper because a backlog has built up, whereas with a Nowomics feed, the key information would be available to see immediately after logging in.

[1] Nucleic Acids Research Molecular Biology Database Collection

Image

CUSPE: From lab to market

On Friday I attended an event organised by the Cambridge University Science and Policy Exchange (CUSPE) entitled ‘From lab to market’. This was a series of four talks on the importance and mechanisms of bringing scientific discoveries into the commercial sector.
Left to right: Kelsey Lynn, Stephen Allott, Alice Frost, Andy Richards

Left to right: Kelsey Lynn, Stephen Allott, Alice Frost, Andy Richards

The first speaker was Kelsey Lynn, from Imperial Innovations, the technology transfer office of Imperial University. She spoke of the different models available for commercialising research, contrasting direct licensing with spinning out a new company, and arguing that the latter is often the easier option. In doing this though, she emphasised that three key factors are needed: a commercial reason, the IP and ability, and equally importantly the desire to work on creating a commercial product. This is something that Andy Richards also mentioned.

Stephen Allott, from the Cabinet Office, spoke about the mechanisms by which science leads to economic growth. His view was that while there are many different mechanisms for each field of research to contribute to the economy, they have an underlying theme: people. He gave as an example Hermann Hauser, who came from Vienna to study for a PhD at the Cavendish Laboratory here in Cambridge. Hauser went on to set up Acorn and ARM, both of which have had a substantial economic impact. The key point is that people such as Hauser come here because it there are other smart people with interesting ideas to work with. Any success in commercialisation of research has to focus on attracting and supporting people. Allott also made the point that the value of an advanced degree is not simply in the knowledge of the field – it is a broader knowledge in how to solve problems, a skill that is applicable anywhere. Consequently, he suggested we expand and support interactions between industry and academia, via approaches such as supporters clubs or industrial placements, where people in industry can meet students and build the personal networks needed for commercialisation and entrepreneurship to thrive.

Alice Frost spoke next; she is Head of Knowledge Exchange and Skills from the HEFCE (Higher Education Funding Council for England), the body that also administers the Research Excellence Framework (REF) exercise that helps determine university funding levels. She discussed the history of universities and their impact, pointing out that until the last couple of hundred years, pure research was not a university activity. Most universities prior were created with charters describing them to be for social and economic purposes (furthering education, bettering the welfare of the city and the state etc.). This was an attempt to head off criticisms of the perceived focus in the REF on applied science versus basic science, and the measures of ‘impact’. In a session focussed on commercialisation of research, this was naturally focussed on one side of the argument.

The final speaker, Andy Richards, is a serial entrepreneur. Amongst other things, he is a council member of the Biotechnology and Biological Sciences Research Council (BBSRC) and a founder member of the Cambridge Angels. He echoed Stephen Allott’s focus on people being the key factor: if an idea is to be brought to market, it needs to have the right people to do it. Further, with companies reducing their own spending on research and development, there will be a greater need for spinout companies in the future. One of his concerns is reducing the separation between academia and industry, especially in cases where a potentially useful technology created in academia is not  developed further due to lack of time, resources or will, and then languishes.

This event took the view that commercialising research is valuable, and there should be more of it. This is true, so long as pure and basic research do not get lost on the way. I see commercialisation as a great idea should an opportunity arise – it should not be considered one of the main outcomes of an academic project, and this I think the speakers would agree with. A previous CUSPE event has also covered the debate over the ‘impact’ of academic research, and how this can – or should – be measured. The arguments presented here are very similar, especially in the historical consideration of the foundation of universities to improve economic and social well-being mentioned by Alice Frost, and the charters for the research councils emphasising in almost the same terms the impact of invested taxpayer funding on society and economy shown previously by the EPSRC chief executive David Delpy.

CUSPE events are always interesting and recommended, especially for anyone with an opinion on how academic science should interact with public policy.

Leptin treatment in rats with undernourished mothers can protect against metabolic syndrome

A new study of the interactions of early life experiences and hormone treatment may be relevant to humans too

We recently published a paper [1], written predominantly by Peter Ellis, aiming to unpick some of the complexities of developmental programming in response to early life environments.

Early life experiences can “program” a later metabolic response

Epidemiological studies have for many years linked low birth weight with higher risk of metabolic syndrome in later life, the symptoms of which include coronary heart disease, hypertension, type 2 diabetes and stroke. This led to the DOHaD (Developmental Origins of Health and Disease) hypothesis: that the trajectory towards disease is “programmed” by early life experiences, both in the uterus and shortly after birth.
The suggestion is that such programming improves the chances of survival in adverse environmental conditions, for example by altering the metabolic state of the developing fetus in response to undernourishment of the mother. However, if the child later encounters a nutritionally rich environment there is a mismatch with the “expected” nutritional level, resulting in an inappropriate metabolic response that sets them on a trajectory towards disease. We have been studying a rat model of this situation, where undernutrition of the mothers during pregnancy leads to the development of the classic hallmarks of metabolic syndrome in the offspring when fed a ‘normal’ diet [2], presumably due to a programmed metabolic state.

The hormone leptin can sometimes override this programming

Leptin is a key hormone for regulating energy intake and metabolism: one of the ways it acts is by suppressing appetite. In our rat model, giving leptin to the newborn pups from undernourished mothers protects them from the subsequent metabolic problems. In this paper, we looked at the activity of genes in the livers of female offspring of undernourished mothers using microarrays. We found that maternal undernutrition and postnatal leptin treatment separately produce a similar pattern of abnormal gene activity, but in rats with both, the resulting gene activity is more normal.
The model we have suggested to explain this is that there are two competing systems: firstly, the metabolism, which can be set to a thrifty state by maternal undernutrition. Rats with the thrifty metabolism will respond poorly to a ‘normal’ diet, and go on to develop metabolic syndrome. Secondly, a ‘set-point’ for diet and food intake, like a thermostat determining when the animal wants to eat. This set-point is calibrated by leptin; a high dose of leptin shortly after birth will suppress the rat’s subsequent food intake. When the two systems come together, the rats are programmed for thriftiness, and hence would develop metabolic syndrome if they ate a ‘normal’ diet, but have been calibrated by the leptin to eat less overall. Consequently, they stay relatively healthy, and at a slightly lower weight than normal rats.

This research has implications for humans

The finding that early leptin treatment can help protect against metabolic syndrome symptoms may be useful in understanding disease risks in rural human communities in which children are born in alternating seasons of feast and famine. We have previously looked at DNA methylation in a population from The Gambia [3], suggesting that a mother’s diet even before pregnancy affects their children’s metabolism and immune system. More work with this rat model may help target treatment and dietary supplementation to the most important time periods.
More broadly, is the leptin set-point calibration a one-off event, or can it be reset by a leptin treatment later in life? If leptin were administered to a morbidly obese patient following a period of weight loss, would it be able to establish a healthier set-point for the patient, and make it easier for them to maintain a lower weight?

Our discovery of genes involved in the process of response to programming opens more questions about the programming itself: what genes are being affected by programming directly, which are the genes responding secondarily, and what form does the programming take? Furthermore, there are sex-specific complexities on top of this. We looked here at female offspring; treating the male offspring from normally-nourished mothers with leptin programs them to develop obesity in later life, while there is no comparable effect in females.
In order to really understand what’s going on, we are going to have to look at and integrate the interactions of a number of different mechanisms of gene regulation.

1.
Ellis, P. J. et al. Thrifty metabolic programming in rats is induced by both maternal undernutrition and postnatal leptin treatment, but masked in the presence of both: implications for models of developmental programming. BMC Genomics 15, 49 (2014).
http://www.biomedcentral.com/1471-2164/15/49
2.
Vickers, M. H., Breier, B. H., Cutfield, W. S., Hofman, P. L. & Gluckman, P. D. Fetal origins of hyperphagia, obesity, and hypertension and postnatal amplification by hypercaloric nutrition. Am. J. Physiol. Endocrinol. Metab. 279, E83–87 (2000).
http://ajpendo.physiology.org/content/279/1/E83.long
3.
Khulan, B. et al. Periconceptional maternal micronutrient supplementation is associated with widespread gender related changes in the epigenome: a study of a unique resource in the Gambia. Hum. Mol. Genet. 21, 2086–2101 (2012).
http://hmg.oxfordjournals.org/content/21/9/2086.full