The Skeptic AI Enthusiast

The Skeptic AI Enthusiast

Share this post

The Skeptic AI Enthusiast
The Skeptic AI Enthusiast
The True “Always On” Devices Are About To Arrive

The True “Always On” Devices Are About To Arrive

If You Think Smartphones Are “Always on,” You Have Seen Nothing Yet

Rafe Brena, PhD's avatar
Rafe Brena, PhD
May 26, 2025
∙ Paid

Share this post

The Skeptic AI Enthusiast
The Skeptic AI Enthusiast
The True “Always On” Devices Are About To Arrive
3
Share
Image by author with AI

“I am absolutely certain that we are literally on the brink of a new generation of technology…”

This has been recently said by Jony Ive, the legendary product designer at the origin of the iPhone, iPad, MacBook, and many other Apple products.

Now on his own, Jony Ive created his own “LoveFrom” company, and then, a mysterious startup called “io.”

But here's the shocking twist: Jony Ive’s startup has just been acquired by OpenAI for an astronomical six and a half billion dollars.

What? 6.5 billion? What could be so valuable to pay that much money?

I recommend you watch the video “Sam & Jony introduce io” on YouTube. This video, of cinematic quality, as if Francis Ford Coppola shot it, shows a partially enacted friendly encounter between Sam Altman and Jony Ive.

Sam starts the video saying:

“I think we have the opportunity here to kind of completely reimagine what it means to use a computer,” and a minute later says: “We are sitting at the beginning of what I believe will be the greatest technological revolution of our lifetimes.”

Boy, both Sam and Jony utter grandiose phrases that usher in the transition to a brave new world. The question remains: Is this just a PR stunt, or is there real substance in what they say? Are we really about to cross the tipping point of a generational technology shift?

My personal take is this:

In the age of AI, how our devices restrict user interaction begins to feel like a drag. The generation of devices like laptops and smartphones is starting to show its age.

I agree: a generational technological shift is likely boiling, approaching its eruption.

I can feel it.

It’s not just the mysterious statements by Sam and Jony; there are plenty of signs all around us; the difficult part is to recognize them and to separate the signal from the noise (because there is indeed a lot of noise!).

Take, for instance, the recent flood of announcements Google made at its I/O big event. The list is so long, I can’t say it all in one breath: Gemini 2.5 Pro & Flash, AI Mode in Search, Gemini in Workspace, Imagen 4, Veo 3, Flow, Project Astra, Google Beam, Stitch, Whisk…

That’s a lot to take in and make sense of. That’s what I’m talking about when I say “separate the signal from the noise.”

But lost in the list of Google announcements are a few new technologies that point to a major transformation in the way we live.

What do Gemini Live, Project Astra, and Android XR have in common? What is the society-level tech transformation they are pointing to?

We’ll get to this after giving some context.

Episodic and continuous-use devices

There has been a trend in the use of communication devices, from landline phones during the 20th century to our beloved smartphones.

As I pointed out elsewhere, “People from the old generation used landline phones sparingly,” and “With smartphones, this was no longer the case. We can be texting all day long, with no clear beginning and end of the interaction.”

Except that texting is not continuous. We not only have to take the device out of the pocket or purse, but also have to use one or both hands to handle it (have you tried to answer a message while carrying the grocery bags?).

A crucial distinction can be made between “episodic” devices and “continuous-use” ones. “Episodic” means you engage with your device (like opening the laptop), you do some interaction, and you end the episode (like closing the laptop), until the next iteration.

Continuous-use devices are worn and active all day long, like the smart watch at your wrist.

To which category belongs the smartphone? The answer to this question is what made me write this article. The insight I had is the following:

Smartphones are episodic devices that are used repeatedly, instead of continuously.

Any interaction you make with your phone has a beginning and an end; what happens is that we iterate multiple times throughout the day.

There is a tension, a contradiction, between the smartphone’s episodic nature and its continuous use.

As I pointed out previously:

“From the point of view of integration with our everyday life, smartphones have never been ideal. Their tiny screen makes us squint, causes eye strain and bad cervical posture, promotes isolation, and is inconvenient for text input and multimedia visualization.

Despite all those disadvantages, we accepted smartphones because they are so darn convenient, and we don’t have anything comparable in terms of functionality.”

In the video I mentioned at the top, Sam Altman gives a good example showing the limitations of current devices: he says:

“The products that we’re using to deliver and connect us to unimaginable technology [he’s talking about AI], they’re decades old. And so it’s just common sense to at least think, surely there’s something beyond these legacy products. If I wanted to ask ChatGPT something right now about something we had talked about earlier, I would like reached down, I would get on my laptop, I’d open it up, I’d launch a web browser, l’d start typing, and I’d have to, like, explain that thing. And I would hit enter, and I would wait, and I would get a response. And that is at the limit of what the current tool of a laptop can do. But I think this technology deserves something much better.”

Sam certainly has a point here. Currently, we have to explain everything to ChatGPT because it doesn’t know the context (for instance, it didn’t hear what you have talked about earlier). That looks obvious to you and me, but it’s because we get used quickly to the limitations of current technology, to the point that it takes a visionary to realize that there should be a better way.

Continuous-use devices

Let’s assume there could be devices more amenable than smartphones for continuous use. What would they look like?

One of the most essential features of continuous-use devices is “live scanning.”

  • Video stream scanning: The video feed of the image in front of the user is continuously sent to AI to identify objects, movements, and situations.

  • Sound scanning is the same, but for sound: an always-open mic sends an audio feed to AI, which identifies which sounds are there, particularly spoken phrases and commands.

All conversations and other sounds are registered and interpreted, not only when you ask the device to do it, but all the time, keeping in memory a window of “recent” events.

Visualize these examples of live scanning in real life:

  • You are listening to the radio, and they mentioned a moment ago a telephone number of a help desk, but you don’t remember it. You ask your AI: “What was the telephone number for help mentioned a moment ago?” and the AI answers, “It was (956)1265–4407 for the Brownsville help desk. Do you want me to call them or write it down in a note?”

  • You ask AI about the movie you are commenting on during lunch. You don’t say to the AI which movie it is, because it gets the context from the conversation. You just ask: “Which actors work in this movie”? And you get your answer.

  • A while ago, there was a heated conversation with friends, and after leaving, you wonder who said it was not their fault. You ask your AI: “Who said earlier today that it wasn’t their fault?” The AI answers, “It was Alice. It was related to the hospital bill of your aunt Mary.”

  • You don’t find your car keys at home. You ask your AI: “Where did I leave the car keys?” The AI answers, “You left them in the kitchen, near the stove.” (This use case was already mentioned in videos by Meta and Google.)

It’s not that only a unique continuous-use device can exist. There could be variations (see below), but it should be possible to use it all day long.

All-day use eliminates “fixed” devices like a desktop computer and some others, like smartphones (which are, by nature, “episodic,” as I already discussed) and headsets. These are great for immersive video games or content in a living room, but they are not usable while eating with others or when you are out and about walking, driving, or cycling.

To make things very specific, I propose exactly four conditions that the wearable continuous-use device should comply with to be usable:

  1. An active camera that points where your head is pointing.

  2. An integrated microphone for sound scanning and verbal commands.

  3. A way of delivering sound. For instance, the Meta Ray-Ban includes open-ear speakers embedded within the frame’s arms, but there are other options, like the ones I mention below.

  4. Hands-free operation. You should have both hands available to do anything else (not the interaction with the device).

Smartphones usually require one hand to use them, sometimes both hands, while typing text. Also, it’s hard to imagine how a smartphone would do a continuous video scan without your hand holding it, so smartphones can’t be a continuous-use device.

Some devices that could be used continuously are:

  • Audio-only smart glasses with a camera and open-ear speakers, like the Meta Ray-Ban, have been sold for many months. They have received decent reception and a loyal customer base. They can interpret the image before you, receive voice commands, recognize songs, and more.

  • Augmented Reality (AR) smart glasses with a camera, like the ones presented a few days ago during the big Google I/O event, where they did a live demo (with some hiccups). They can either have open-ear speakers or rely on Bluetooth earbuds, though I guess open-ear speakers are a more practical option.

  • Earbuds with a camera. Yes, I’m not inventing anything: Apple has registered a patent for earbuds, including a camera. If you think about it, it makes sense, as they would comply with the four requirements (camera, mic, sound, hands-free) for a continuous use device. This way, you wouldn’t have to wear glasses at all (if you’re like me, who has to wear glasses all day long anyway, the practicality factor plummets).

  • Headbands, either soft or rigid, fitted with a small camera, a mic, and speakers. I know this one sounds even weirder than earbuds with a camera, but with a band, you wouldn’t have to wear glasses or earbuds and would have all the needed functionality (camera, mic, sound, hands-free).

  • Add-on for a cap you’d wear, which is a variation of the headbands.

  • Your good old smartphone hanging around your neck with a strap, so the camera is pointing forward. For sound, you either rely on the integrated mic and speakers or use earbuds. This “poor man” solution would use an app or a modified operating system. Its practicality is definitely questionable.

Any of these devices (and others I can’t imagine–please help me out in the comments) fit the bill for continuous use. You must understand that the most essential part is not the device itself but the AI it connects to. The AI does the heavy lifting, interpreting images, sounds, voices, and commands, and taking appropriate action.

My personal bet is that AR devices could be a game changer, with use cases that no other device could do.

For instance, any of the four continuous-use devices I mentioned above can be used to do the following:

  • You are a tourist walking. You ask, “What was the name of the avenue I crossed a moment ago?” and get a verbal answer or a text in the AR overlay. You can add “Which building in this street has historic significance?”

  • You are trying to troubleshoot a kitchen appliance, but you don’t know what a blinking light indicates — unless you dig deep into the manual or Google. You can ask the AI “what is this blinking light?” while pointing your head (and the glasses’ camera) to the device.

  • Cooking: Instead of only giving you directions for a recipe, the AI looks (with the camera) at what you are doing and estimates the amount of oil relative to the quantity of meat it sees you cooking. It would judge when those chicken cutlets are golden brown, and then it tells you to stop pan-frying.

  • Nutrition: Your smart glasses or equivalent scan what you are eating in your meal. You don’t need to declare what you eat (or cheat about your guilty pleasures!). You just eat while wearing your device with a camera, and the AI continuously scans the image and recognizes what you are eating, as well as calculates the calories and proteins of that portion you took to your mouth.

AR-glasses can additionally do things like:

  • Help you with navigation while driving or cycling, as they become a true head-up display that adds a guidance overlay to the image of the street. Instead of the annoying “at the roundabout take the 4th exit” of Google maps, you’d follow the bright arrow pointing at the right exit.

  • While troubleshooting a device, a bright arrow in the augmented overlay would point to the exact button you have to press, instead of giving you a spoken explanation.

  • If you asked for information about buildings in a city you are visiting, the AR overlay would show you exactly where was the office where Roosevelt worked at the law firm of Carter Ledyard & Milburn in New York City before getting involved in politics.

  • Presenting rolling text as a teleprompter. The speed of the rolling text would adjust to how you speak because the AI follows what you are saying.

Android XR

Now I can answer: What do Gemini Live, Project Astra, and Android XR have in common? Well, Gemini Live is about real-time interaction with AI using voice and image, while Project Astra focuses on understanding the user’s context, such as previous conversations. Finally, Android XR is the software vessel that gradually incorporates the other two into real products.

Some people who hated smart glasses (and had hated Google Glass before) are now being convinced by the Android XR proposal.

What does Android XR do better than Apple Vision Pro?

I see two things:

  • Android XR is built around AI, Gemini in particular, while Apple hasn’t been able to deliver a decent AI service so far, even less to do a good integration of AI with the Vision Pro.

  • The Apple Vision Pro isn’t and couldn’t be a continuous-use device, as it’s a headset that can’t be used while driving or walking.

Android XR, on the contrary, is not restricted to headsets and encompasses a range of target devices, from audio-only AI smart glasses to VR headsets, including AR glasses — which I think are the most promising continuous-use option.

Android aims to bring live scanning as defined above, or, in Google’s words, “see what you see” and “hear what you hear.” (Is this a promise or a threat?).

Android XR aims to blend the digital and real worlds in a supposedly useful way. Google promises to do it while respecting privacy as well…

It seems to me that Android XR will succeed where Apple Vision Pro failed because of its AI-centered focus and its much broader scope. This is independent of whether the specific Samsung headset “Moohan” with Android XR will be a hit or a flop.

Comparison of smart glasses with Google Glass (Sergey Brin’s interview, “I’ve learned a lot,” Brin said). He mentioned there was “a technology gap” back then, referring to the fact that current and future smart glasses are all about AI, which wasn’t fully developed at the Glass’ launch.

Android XR is unlike Google Glass 2.0 because of a vast difference: AI. AI wasn’t there for the first iteration. The new smart glasses are built around AI, not just using it as a complement. From voice commands to image interpretation and scanning, everything is done with AI. So it’s the other way around: it’s AI that made smart glasses possible this time.

The Jony Ive device

We don’t know exactly which devices the “io” company will bring, but they say it will be a “family of devices,” not just one.

Further, we can be sure they intend to go for the mass market (as opposed to the Apple Vision Pro, which is oriented to the ultra-high-end segment) because they plan to sell a whopping 100 million of them “faster than any company has ever shipped 100 million of something new before,” as Sam Altman said in a meeting at OpenAI.

Now, the most important thing they said about their new devices is that they will be “capable of being fully aware of a user’s surroundings and life.”

To me, this means “continuous-use device,” in the style of smart glasses and Android XR, except that Ive said they are “not going to make smart glasses.”

We may wonder: what’s the interest of OpenAI in building hardware?

I view it like this: users use AI through devices, and if the AI company doesn’t have a device of their own, that’s a limitation. For instance, in Apple Intelligence, the iPhone could use ChatGPT (whenever Apple finally gets its AI ready), but ChatGPT doesn’t own the operating system, so they don’t have full access to the camera or microphone. This is an obstacle for a continuous-use device that “sees what you see and hears what you hear.” Though in theory Apple could have a deep collaboration for developing a new continuous-use product, that won’t happen because it’s not in Apple’s DNA. They are control freaks who want to own all aspects of the user experience–and perhaps they are right.

OpenAI could greatly benefit from having a foothold in the hardware industry. If they actually sell 100 million devices to the public, that would become a massive distribution channel for their AI software products.

A brave new world

Despite the repeated use we make of our phones, we are not used to truly continuous device use. We don’t even know what it means to continuously use a smart device with a camera, mic, sound, and hands-free.

Now, the continuous-use new devices, with a camera, mic, sound, and hands-free, will raise convenience to a whole new level because the AI fueling them will give them limitless superpowers.

We have to understand that the only way to avoid typing the entire context of a request to an AI is to share with it, in real time, what is happening in your life. As Google puts it, “see what you see and hear what you hear.”

This is uncharted territory. Continuous-use devices will initially include very few applications, way far from the accumulation of functions that smartphones have achieved so far, making them true Swiss-army knives of the digital age.

Google will definitely provide developers with a simple path to incorporate their apps into the new AR glasses and the like, but it will be a long road to a dominance similar to that of smartphones.

Now, with continuous-use devices, some previously unthinkable scenarios become possible.

Consider this scenario, hinted at by Sam Altman in the video mentioned before, when he talks about “what it means to use a computer.”

Imagine you are repairing a TV set. Instead of crafting a long and detailed prompt to describe the situation you are facing, the AI has already gathered the context from your interactions, the YouTube videos you have watched, the questions you asked aloud to people around you, and by looking at the actual TV set at home with the tiny camera.

You just ask, “Where should I connect this cable?” showing a cable, and the AI will tell you the answer, verbally for audio-only smart glasses and with a bright arrow pointing to the place to plug the cable in the case of AR glasses.

You don’t have to describe what you want to fix or the exact model, because you already gave that information from seeing the device or talking to somebody. Remember, the AI “sees what you see and hears what you hear.”

No more prompt-crafting skills will be needed. That will be a thing of the past.

Dangers looming on the horizon

Keep reading with a 7-day free trial

Subscribe to The Skeptic AI Enthusiast to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 R. Brena
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share