The Age Of Conversational Bots Is Starting In A Few Weeks. These Videos Are Proof It Will Transform How We Learn
Reminder: The fourth Augmented Intelligence class will be today at 11:00am EST. The Zoom and calendar links are at the bottom of this email, along with links to the recordings of the first three classes.
The future is already here—it's just not evenly distributed.
—William Gibson
Imagine having a conversation with an AI that not only understands your words but also picks up on your tone, accent, and even your facial expressions. This isn't some distant future. It's happening now with ChatGPT 4.0's new advanced voice mode, currently being rolled out to select users.
In the last few weeks, I've scoured the internet, diving deep into demo videos that showcase this new voice version in action. What I found was nothing short of remarkable. The AI responds instantly, adjusts its tone to match yours, and even knows when to pause so you don't end up talking over each other. It’s like having a conversation with someone who’s truly listening and engaging in real-time.
But the potential here goes far beyond just better conversations. This shift to advanced voice AI is about more than convenience; it’s a fundamental change in how we interact with technology. It's about feeling like you’re talking to a being rather than a tool.
Personally, I’m excited about the opportunities this opens up for creating conversational chatbots that can help me make better decisions, spark creativity, and facilitate learning. As someone who already spends over an hour a day exchanging voice notes with friends and myself, I see this as a natural extension of how I communicate and process ideas.
But before we dive into the six most fascinating videos I found, let's take a moment to understand why this leap in voice technology is such a big deal—and why it's something you should be preparing for right now.
To Benefit From Technological Shifts, Understand User Interface Moments According To Renowned Futurist
When ChatGPT was released in November 2022, it immediately exploded and became the fastest-growing software tool in human history.
What many don’t know is that the AI behind ChatGPT had already been out for two years but had only been used by techies.
Let that sink in for a moment. The same software went unnoticed by regular people until they got an easy way to use it. The moment they did, it exploded.
The lesson for me was that what makes AI special isn’t just the underlying tech. It’s the user interface. I believe that having conversations with AI will be the new killer user interface for the average person worldwide to use AI.
This intuition is formed by the history of technology.
Peter Diamandis is one of the greatest living futurists alive today. Four of his big lessons learned from studying the patterns of technological and media history are particularly relevant:
Technologies go from deceptive to disruptive overnight
‘User interface moments’ are a common inflection point
Richer media leads to richer experiences
Richer experiences lead to more adoption
#1. Technologies go from deceptive to disruptive overnight
In the early years of disruptive technologies, few “normal” people are aware of the technology and its potential. This phase is deceptive because the technology is expensive, hard to use, and not super functional. As a result, it has litte adoption and is often seen as a toy.
Then, an inflection point happens, which causes lots of people to adopt the technology and for that technology to disrupt the way things have been:
#2. User Interface moments are a common inflection point
Many major technologies went mainstream when the user interface became easy and intuitive:
Internet Browser. The Internet exploded after the launch of the first easy-to-use browser, Mosaic (later Netscape). “In 1993, when Mosaic went online, there were 26 websites. By 1994, there were 10,000 websites.”—Peter Diamandis
Fortran Programming Language. “Fortran, one of the first programming languages, allowed average users to use complex IBM computers.”—Peter Diamandis
App Store. “The iPhone’s App Store allowed individuals to write programs that can instantly download into the hands of hundreds of millions of users.”—Peter Diamandis
#3. Richer media leads to richer experiences
Source: OpenAI
With ChatGPT’s new voice mode, the AI can:
Observe our facial expressions and body language
Hear subtle tonality shifts
Change its tone
In Evolution of Your AI Assistant, Diamandis connects the following dots on why this is a big deal:
93% of communication is non-verbal according to research.
When AI can hear voice tones and read facial experiences, it gains the ability to be more empathetic.
This empathy reduces miscommunications and leads to conversations that are more emotionally rewarding.
Empathy turns AI from tools to companions. This shift means we can talk with AI, not just to it.
We can already see this happening with the rise of AI companion companies and individuals having some of their closest relationships with these companions.
#4. Richer experiences lead to more adoption
Diamandis is not alone in seeing the trend toward richer media. Over the years, Mark Zuckerberg has pointed out the trend to consumers adopting richer and richer media as the technology to support it becomes available:
“If you go back to the Internet 10 years ago, most of what we shared online was text. …Since then, now we all have phones that have cameras. So we take a lot of photos. And now as our Internet and networks are getting better, we can start to support more video. Right? We can upload videos, we can do this live like we’re doing here. And that is just making this experience richer and richer, and you’re getting a better sense of what people are experiencing and feeling around the world.”
—Mark Zuckerberg
With this context set, it’s also worth understanding a few other hidden benefits of voice over text before we jump into actual demo videos that show how AI voice bots can help us learn and grow…
Three Hidden Reasons Why Voice Is A Big Deal Compared To Text
#1. Spoken language is easier than written language
We humans have been using spoken language for at least tens of thousands of years and maybe 200,000 years. Written language is only 5,000 years old. Furthermore, children learn to speak before they write. Finally, brain scans show that spoken language has a lower cognitive load.
When I’m feeling tired writing articles on my computer, I always get a jolt of energy when I go for a walk and leave voice notes with or have real-time conversations with my friends who also love learning new things and talking about ideas.
#2. AI is an always-available conversation partner
When we use AI with voice, we shift from AI being a Q&A bot we go to when we need something specific to it being a conversation partner with benefits:
No more worrying about schedule coordination.
No more censoring yourself because you don’t want to bore someone or look bad or hog the conversation.
Something we go to when we feel like talking, reflecting, and thinking out loud.
#3. We can chat with it anywhere
“Voice interaction both expanded where computing could be done, from situations in which you could devote your eyes and hands to your device to effectively everywhere, even as it constrained what you could do."
—Ben Thompson in Stratechery (my favorite technology analyst)
This means we can talk with AI while we:
Walk
Sit at our computer
Ride in our car
Do chores
Thus, we actually can have hours a day available to talk with AI if we so choose. And we can shift to a way of interacting with AI that is more relaxing and creative.
Summary
ChatGPT’s new advanced voice mode, which it’s releasing to all users this fall, has the chance to introduce a new paradigm for how we interact with computers. And, it is fundamentally different than a text interface:
When you add these individual differences up, you get a difference in kind, not just a difference in degree…
AI Voice Is The Ideal Medium For Augmented Intelligence
As I explain in This Augmented Intelligence Video Changed How I See The World, what gets me most excited about AI is augmentation, not automation.
And now that I’ve built dozens of augmented intelligence bots, I think one of the biggest challenges is actually remembering them all and using them regularly at the right moments.
The potential I see for conversation bots is the future ability to:
Start a conversation when I feel like talking about an idea, a challenge, expressing an emotion, or something else.
Having the relevant bots augmentation triggered so that I can have the conversation better.
As someone who has been a power user of voice memos, I can see myself easily doing this for more than an hour every day. However, with just text bots, I see it being something I do when I am at my laptop and have high energy, intention, and discipline.
With that context set, here are…
6 Demos That Show Amazing Ways To Use ChatGPT Voice To Augment Your Intelligence
Coach
Educational storyteller
Reader
Co-Reader
Role player
Feedback Provider
#1. Coach
Curator: AI Safety Memes
Imagine having ChatGPT coach you in real-time as you confront a challenge that you’re stuck at.
Imagine customizing the bot with:
A particular coaching methodology
A unique voice and personality
Expertise in certain topics
An understanding of your personality
#2. Educational Storyteller
Source: @CrisGiardina
We humans seem to have an endless appetite for stories. Imagine a conversational bot that is particularly good at creating entertaining stories with deep lessons about life.
#3: Reader
Source: Ethan Mollick
Imagine AI reading any classical book to you in any tonality.
Or imagine AI creating a customized version of an existing book in a voice, style, tone, structure, and length that is more amenable to you.
#4: Co-Reader
Source: Dan Shipper of Every
Dan Shipper contextualizes the significance of having ChatGPT as a co-reader with the following quote:
I started to realize how many things I wonder about, or how many questions I have as I read that I don't follow up on because it's like too taxing. It just takes too much time. And using ChatGPT like this in voice mode, you have this like 24-7 companion with you that lowers the bar for asking questions because it's always there to answer.
#5: Role Player
Source: Twitter
Imagine role-playing any of the following scenarios so you can practice skills in real-time:
Negotiation
Conflict Resolution
Job Interview
Group Conversations
Acting
Public speaking
Coaching
#6: Feedback Provider
Source: OpenAI
Imagine using AI to get feedback on how you look and communicate. For example, imagine writing a rough draft of an article and having the voice bot read it back to you so that you can notice improvement possibilities you would’ve overlooked if you were just reading it to yourself in your head.
Counterarguments Against Advanced Voice Mode
While I’m watching this feature very closely, I’m also exploring the best counterarguments so I can stay rational. The best ones I see so far are below:
The new version may be a major leap, but still not a big enough improvement to be widely adopted. In the end, I haven’t used ChatGPT’s new voice mode. I’ve only watched demos and the real thing is never as good as the demos. So, it may not be quite at the ease inflection point I think it could be. At the same time, that doesn’t change my conclusion that when the voice interface is figured out, it will be a seminal moment in AI history.
AI sounding like a human is weird and perhaps risky. It’s not hard to imagine how more and more people could turn to AI for conversations that might have turned to their friends and family for in the past. This could give the feeling of intimacy without it actually existing in the same way that social media gives the feeling of having lots of friends when they’re something different.
What I’m Doing Now
While I don’t have the advanced voice mode feature yet, there’s still a lot I can do.
At the very least, knowing about “user interface moments” has inspired me to do two things differently:
Watch the full release very closely. As soon as I get the new advanced voice mode, I will be using it heavily. I will also be looking for evidence and counterevidence for my thesis.
Brainstorm conversation bots. I am actively thinking about what conversation bots I can create that would augment human intelligence.