Welcome to Marks & Clerk’s new series of articles regarding the nature and role of data in the patent system. This current article is intended to serve as an introduction to the series, explaining why data is important, and to give a taster as to what readers may expect from future articles. In itself this article is not intended to answer any questions, nor to provide advice – for that, readers will have to wait for the rest of the series! I hope, however, that this article sufficiently discloses the key themes we will be touching on, and enables the reader to look forward in excitement to future articles.
Since the early days of patents, there has been a tension between the grant of limited monopoly rights (seen as necessary “To promote the Progress of Science and useful Arts”, as the US Constitution has it) and the potentially stifling effects of over-rewarding creators and inventors with an excessive monopoly. This tension is usually expressed as the fundamental patent bargain of exchanging disclosure of the invention in return for those monopoly rights.
But what does “disclosure” mean? The words of the relevant statutes (eg, EPC Art 83, or 35 U.S.C. 112) don’t add much clarity. “The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same”; or “The European patent application shall disclose the invention in a manner sufficiently clear and complete for it to be carried out by a person skilled in the art.” In practice, the requirements have come to be seen that the disclosure should be sufficient, or enabled, across the scope of the claims. That is, if the applicant is claiming the moon on a stick, then the disclosure must enable the reader to actually obtain the moon on a stick. Conversely, if the applicant is claiming only the stick, then a source of sticks is all which needs to be disclosed.
Where data comes into play depends on the nature of the invention, and approximately to the relevant technical field. This is most relevant for the life sciences. For example, to claim a simple mechanical invention, it may not be necessary to include data showing the mechanism works, provided it can be built. (Subject, of course, to the laws of physics where perpetual motion machines are concerned). On the other hand, if the invention relates to a broad range of molecules asserted to have a particular function, or to a therapeutic drug, data may be needed showing such molecules can be obtained and that further undisclosed molecules in the same class would also have that function.
The requirements for the extent of data required have ebbed and flowed over the years. Originally, the USPTO required inventors not just to file a patent specification, but also a physical model (which may have been functional). Obviously this could not be sustained, both for reasons of space, and as technology moved away from the purely mechanical. In 1873, Louis Pasteur obtained a patent covering “Yeast, free from organic germs of disease”. Was the disclosure then sufficient to allow preparation of any type of yeast, rather than just brewing yeast? From the 1990s, for a short period the explosion of genomic data led to companies filing patent applications with thousands of short DNA sequences (ESTs), with essentially no data showing how these could be used beyond the assertion that they may be useful as probes. Many of these filings went nowhere, but they did highlight key divergences in practice between the USPTO and the EPO.
The recent US Amgen decision highlights another current divergence – the tl;dr of the decision is that, in the US, simply disclosing an antibody target and several antibodies may not be enough to enable a broad claim to any antibody against that target, even if protocols for generating and selecting further antibodies are also taught. In contrast, the current EPO position is that generating antibodies against a given target is routine, so would be sufficiently disclosed (but, in the absence of further features, may not be inventive – a common attack in the EPO, where a less ambitious technical problem may avoid being considered insufficiently disclosed only to fall to an inventive step attack).
Although the fundamental requirement is that the patent or application itself must sufficiently disclose the invention, there has in many jurisdictions been the possibility of submitting post-filing data to back up the original disclosure. This has of course generated many column inches as the EPO and national courts have recently grappled with the so-called plausibility of the invention, culminating in the recent EPO Enlarged Board of Appeal decision G2/21. Different national approaches to the acceptance of post-filing data can be a hazard for the unwary. This can also be critical in the assessment of second medical use patents, where a known drug is used to treat a new indication. Clearly there is no issue with being able to make a known drug, but what are the requirements to convincingly demonstrate a new therapeutic effect in return for a patent?
The concept of data does not of course only embrace experimental data. In the current era of Big Data and machine learning there is ample scope for novel issues to arise which need consideration. Can an invention be sufficiently disclosed if it is only obtained through mining huge datasets? What role can there be for AI-generated data? With the rise of machine learning models to predict protein folding and AI-directed drug discovery, what sort of data will be needed to support such predictions? Can the USPTO maintain their post-Amgen approach to antibody claims when it might be possible to generate thousands of antibody sequences based on the protein sequence of the target? To some extent this may be informed by existing practice on hypothetical examples – those based on an expert prediction of future results (one study on “prophetic data” estimated that “at least 17% of experiments” in a dataset of US biological and chemical patents “are fictional”). There is, of course, also a murkier question of what happens with wholly-invented data, whether arising from incorrect assumptions, or from false prophets – while there is clearly a “paper-mill” problem with scientific publications, could patent applications suffer from similar issues?
Our forthcoming article series will touch on all of these issues, and more, so keep checking back for new publications, as we continue to investigate the role that data plays in the world of patents.