The launch of GPT-4 comes with concrete use cases that are already changing our world. It's time to keep up with its implementation, and to treat LLMs like the tools that they are.
Our “AI” world is here. What remains to be seen, and navigated, is how we’ll integrate large language models (LLMs) in the coming months and years into our workflow. When OpenAI’s Chat GPT-4 launched on March 14, the announcement came with a significant number of use cases that illustrate some of the paths for this technology going forward.
GPT-4 is currently only directly available to current subscribers to ChatGPT Plus, which is charging $0.03 per 1,000 prompt “tokens” (around 750 words) and $0.06 for completion tokens, the content the LLM produces. New features for GPT-4 include the ability to use visual prompts (still on research-only preview), a reduction in “unsafe” answers (harmful advice, hate speech), an uptick in problem-solving and test-taking capabilities, and more intricate, “creative” replies with reduced repetition.
Still plaguing the algorithm is factual errors, though, many of which it “hallucinates”. The company also notes that its native instance of GPT-4 still has no access to event data from anything after September 2021.
But more critically, GPT-4 did not launch alone. Microsoft confirmed that its latest iteration of Bing Chat is already running on GPT-4, and Khan Academy, Duolingo, Be My Eyes, and Stripe have all announced partnerships related to the new technology. These implementations will put GPT-4 to work as learning tools for everyday users.
Frederic Lardinois of TechCrunch provided a thorough early walk-through of Bing’s GPT, which we now have confirmation was a GPT-4 implementation all along. This implementation still required users to catch out hate speech and user prompts that would foster similarly hateful or false content, but Lardinois reported a high success rate in fixing the problem when he reported inappropriate LLM outputs.
As OpenAI boasted with the launch of its latest version, “We spent 6 months making GPT-4 safer and more aligned. GPT-4 is 82 percent less likely to respond to requests for disallowed content and 40 percent more likely to produce factual responses than GPT-3.5 on our internal evaluations.”
Implementation on Bing comes in three parts: a chatbot that shapes its answers in a more conversational manner, the main search results, and a new “copilot” for the Edge browser, to help users sift through other site content faster. Bing Chat is more robust than previous builds when it comes to providing follow-up sourcing, but the sources themselves can still be quite suspect.
Khanmigo, by Khan Academy, is a small pilot program for educators: a new learning “tutor” that can help explain facets of lessons. The nonprofit’s mission has always been to make education free and widely accessible, and in this work the LLM will serve as a kind of “virtual Socrates”, reframing the material as questions to improve student learning. Founder Sal Khan also wrote of GPT-4 having “enormous potential” to “assist teachers with administrative tasks, which saves them valuable time so they can focus on what’s most important—their students.” He also noted the need for caution in its implementation, due to aforementioned weaknesses in its replies.
Duolingo Max is a new subscription tier for the language-learning platform. It offers GPT-4-based assistance on a user’s language-learning journey. “Explain My Answer” will help students grasp the theory behind issues they’re having with a given concept, and “Roleplay” allows for more interactive uses of the target language. In the latter, conversational blocks exist as side quests on the app. After the user has attempted to converse, the chatbot steps in with constructive feedback on their sentences.
Be My Eyes
The Danish startup Be My Eyes serves blind and low-vision people by connecting them to potential volunteers for a variety of real-world tasks, like navigating challenging spaces, online and otherwise. The company trained GPT-4 to provide a Virtual Volunteer option for both Android and iOS users; the software can now identify and summarize key features on a map, or complex web pages for e-commerce, to assist with everyday activities.
Stripe, the online payment system, is also using GPT-4 to quickly scan web pages, along with Discord forums. The aim here is twofold: to improve efficiency when assessing business operations, and to detect early warning signs of fraud from account activities. GPT-4 is also expected to help Stripe’s developers improve workflow, by reading through and summarizing documentation for its workers.
The above use cases for GPT-4, while not comprehensive, outline the beginnings of a road map for everyday implementation of this new technology. After months of indulging in panic about “AI”, we’re starting converge on the fact that this is a tool we need to recognize as such, and manage accordingly. The ongoing inability of these LLMs to identify for themselves the existence of factual error is a concern for human developers and users to address, as is the growing capacity of this technology to pose further threats to privacy (what little of it remains, at least).
Deep fake possibilities have been with us now for decades, but this technology will also pose added challenges for differentiating what has been constructed and what is true. These will not be easy times for news media and related information exchanges critical to democratic practice. However, the recently demonstrated abilities of GPT-4 and other LLMs at least illustrate the roles that humans will have to play, in cultivating solutions for this brave new world upon us.