The generative ai models can create both text and image. And since text and image are broad concepts, a new infrastructure is waiting around the corner to solve old and new problems.
The Elicit search engine has changed the way I look for scientific articles in my research.
The large language model GPT-3 made Ny Teknik’s reporter Peter Ottsjö break out in a cold sweat in the summer of 2020, he told in an analysis full of wonder.
By use GPT-3 captures Elicit summaries of the articles that match my searches, compiles other articles that cite them, and also suggests related questions. For me, this has meant both a faster and better research phase.
Right now, hardly a day and certainly not a week goes by without me being amazed by new applications of what are called generative ai models. This is technology that takes data and creates something new.
In Elicit’s case, the input is both my original question and all scientific articles in the database, and what is created are the summaries, citation compilations, and supplementary question suggestions.
When Peter felt a cold sweat in August two years ago, Open AI had just demonstrated GPT-3. But the technology was still only available to a limited number of developers. Since then, the possibilities of generative ai models have gradually become a tool for more people. In November last year was released the api, the programming interface, to GPT-3 free.
And in parallel with the development of the language models, Open AI has also worked with image creation. About a month ago it was Dall-E 2 api freely available. But by then, many developers – and regular users – had already spent a lot of time with Stable Diffusiona generative ai model for images that Stability AI released as open source.
Many of the early demonstrations of what can be done with GPT-3 were either long texts created from a shorter instruction, what in this context is called a prompt, or dialogues between a human user and the ai model.
But this fall, my corner of the internet has been flooded with articles, blog posts, podcasts, tweets, and chat conversations about how the models are being used to solve more narrow and specific tasks than that.
A short film shows an implementation of GPT-3 in Google Sheets, and how the language model can be used, among other things, to interpret addresses, categorize feedback from users or summarize customer reviews.
This weekend I built =GPT3(), a way to run GPT-3 prompts in Google Sheets.
It’s incredible how tasks that are hard or impossible to do w/ regular formulas become trivial.
For example: sanitize data, write thank you cards, summarize product reviews, categorize feedback… pic.twitter.com/4fXOTpn2vz
— Shubhro Saha (@shubroski) October 31, 2022
Summaries are also one of several uses in a text i the scientific journal Nature which describes how researchers use language models, among other things to suggest a possible summary based on loose notes.
A special version of Open AI’s language model is called Codex. It is trained on program code and is used to generate new code based on an instruction expressed in human language. A fascinating example shows how robots are controlled with plain English – and with intricate instructions that follow different types of logical reasoning.
Another demonstration by Codex shows how the model successively “reasons” to arrive at a solution, and how it also identifies the errors it makes itself.
new OpenAI Codex demo: solving complex problems with multiple iterations, result checking, and thorough comments
programming will be disrupted just as much as image creation! pic.twitter.com/KWNUuP5go6
— nearcyan (@nearcyan) October 31, 2022
For developers, features like these are available as everyday tools, including in the form of Github Copilot which offers a real shortcut in coding.
For everyone who writes “regular” text on a daily basis, a new wave of digital sounding board is to be expected, with Moonbeam and Lex as two currently hyped services. For the most part, word processors have still mostly been improved typewriters.
Soon we can count on getting help to get on when the writing cramp sets in, when we want to change the tempo or work on the style, make it a little more formal or informal depending on the context. Of course, specialized models for different types of text await around the corner. One for technical manuals, one for project planning, one for research applications, one for legal and so on.
But the most exciting thing is probably how the language models are used to take on challenges that you don’t really think of as text processing. As the researchers at Meta using language models to predict protein structures. As soon as a problem can somehow be expressed in text, it will often also be available for a generative ai model to tackle.
As far as the models for images are concerned, the corresponding development is underway, and involves, among other things, supporting creative processes or creating large amounts of training data for AI models that will learn to identify cancer tumors or identify pedestrians in self-driving vehicles.
If you zoom out from the individual applications, it becomes clear that we are currently seeing an infrastructure for new services and solutions emerging. Completely in line with what we have seen for a few years on the 33 list, Ny Teknik’s startup list, where ai are in various ways important building blocks for many of the companies that take a place on the list.
When the generative ai models become widely available, they will lay the foundation for a new rapid innovation cycle, but also among companies that are not quite as technologically advanced as those on the 33 list.
Anders Thoresson is a technology journalist and regularly contributes to Ny Teknik. He is also an employee at AI Sweden.