The artificial intelligence (AI) landscape continues to heat up as OpenAI announces GPT-4V, a model with visual capabilities and multi-modal conversation modes for their ChatGPT system.
With the recent upgrades unveiled on September 25th, ChatGPT users will now be able to interact with the chatbot in conversational ways. The ChatGPT supporting models, GPT-3.5 and GPT-4, can now understand voice queries in plain language and respond using one of five different voices.
According to a blog post from OpenAI, this new multi-modal interface will allow users to interact with ChatGPT in novel ways:
“Take a picture of a landmark while traveling and chat directly about interesting facts about that landmark. When you’re at home, snap a photo of your fridge and pantry to figure out what’s for dinner (and follow up with questions for step-by-step recipes). After dinner, help your child solve a math problem by taking a picture, circling the math problem set, and having the set provide hints for both of you.”
The upgraded version of ChatGPT will be rolled out to Plus and Enterprise users on mobile platforms in the next two weeks, with further access for developers and other users “shortly thereafter.”
The multi-modal upgrade of ChatGPT comes fresh after the launch of DALL-E 3, OpenAI’s most advanced image generation system. According to OpenAI, DALL-E 3 also incorporates natural language processing. This allows users to converse with the model to refine results and integrate ChatGPT for assistance in creating image prompts.
In other AI news, OpenAI’s competitor, Anthropic, announced a collaboration with Amazon on September 25th. Amazon will invest up to $4 billion to include cloud services and hardware access. In return, Anthropic states that they will provide advanced support for Amazon’s Bedrock AI platform along with “customization and fine-tuning of enterprise-safe models.”