Computer vision is the ability of AI to understand and interpret images and video — to "see" the world the way humans do, but often faster and at a larger scale. It's used in your phone's face recognition, your game console's motion sensing, medical diagnosis, self-driving cars, and wildlife conservation cameras that automatically identify endangered species.

Seeing sounds simple. You open your eyes and you see things. But for a computer, "seeing" means processing an image — which is just a grid of coloured pixels — and extracting meaningful information from it. That's a surprisingly hard problem, and solving it required the breakthroughs in neural networks we talked about in an earlier lesson.

How a Computer "Sees" a Photo

When you take a photo on your phone, the image is stored as millions of tiny dots (pixels). Each pixel has a colour value — three numbers representing how much red, green, and blue light it contains. A computer sees a photo as a massive grid of numbers.

To a human, that grid of numbers is immediately recognisable as "a photo of a cat." To an AI that hasn't been trained, it's just numbers. Computer vision AI learns to translate those numbers into meaningful descriptions — object categories, positions, relationships, emotions, age, and more.

Early computer vision used hand-crafted rules: detect edges here, find shapes there. It worked poorly. Modern computer vision uses convolutional neural networks (CNNs) — a special type of neural network designed for images. CNNs break images into small patches and find patterns across all the patches simultaneously. By training on millions of labelled images, they become remarkably accurate.

Computer Vision You Use Every Day

Face unlock on your phone: Your face is mapped in 3D by hundreds of reference points. The model trained on millions of faces matches your face's specific geometry. Designed to fail if someone holds up a photo of you.

Portrait mode on cameras: The AI separates the subject (you) from the background in real time, blurring the background while keeping the subject sharp. It uses a segmentation model that, pixel by pixel, decides "foreground" or "background."

Google Photo search: When you search your Google Photos for "beach" or "dog" and photos appear without you having tagged them — computer vision analysed and categorised every photo automatically.

Medical imaging: Radiologists use AI tools that scan medical images (X-rays, MRI scans) and highlight regions that might be abnormal. The AI often spots patterns that the human eye could miss at normal speeds. It doesn't replace the radiologist — it helps them focus attention on the areas that need closest examination.

Wildlife cameras: Conservation organisations put cameras in remote locations to track wildlife. Computer vision automatically identifies species in millions of photos without researchers having to examine each one manually. This has dramatically accelerated wildlife population research.

What I Find Most Interesting About Computer Vision

In 2015, Google's computer vision AI was found to be labelling photos of Black people as "gorillas" — a deeply offensive error caused by bias in the training data. The company's response to this, years later, was to block the ability to search for certain terms entirely, rather than fix the underlying model. That's a computer vision story that isn't about impressive technology — it's about the consequences of building systems without enough care for who they affect.

I think understanding these failures is as important as understanding the successes. AI isn't neutral. It reflects the biases in its training data. Computer vision that works well for some groups and poorly for others doesn't stay a technical problem — it becomes a fairness problem that affects real people's lives.

Frequently Asked Questions

What is computer vision in simple terms?

Computer vision is AI that can understand images and video — recognising objects, faces, movements, and patterns in visual data.

How does face recognition work?

A neural network maps hundreds of reference points on a face (distances between eyes, nose shape, jawline geometry) and matches that pattern to its training data. 3D face recognition is resistant to photos but requires a clear view of the face.

Is computer vision used in medicine?

Yes. AI tools help radiologists identify potential abnormalities in X-rays and MRI scans, often detecting patterns the human eye might miss at normal examination speeds.

What is a convolutional neural network (CNN)?

A type of neural network designed specifically for processing images. It analyses small patches of an image simultaneously and learns patterns across those patches. CNNs are the foundation of most modern computer vision AI.

Explore the Full AI Course

Computer vision, neural networks, prompts, and more — free at KidsFunLearnClub, taught by Parikshet.

Start Learning Free →