Vision Kit (V1)

Build an intelligent camera that can see and recognize objects using TensorFlow

Project Overview

This project lets you build an image recognition device that can see and identify objects, powered by TensorFlow’s machine learning models. All you need is a Raspberry Pi Zero W, a Raspberry Pi Camera 2, and a blank SD card. A free Android app is coming soon to help you easily control your device.

Assembling the kit should take about an hour.

Get the kit

The Vision Kit is out of stock. Join the waitlist and we'll let you know as soon as they are back.

List of Materials

Open the box and verify you have all of the necessary components in your kit. You’ll also need a couple of tools for assembly.

VisionBonnet accessory board
11mm plastic standoffs
24mm RGB arcade button and nut
Privacy LED
LED bezel
Tripod mounting nut
Lens, lens washer (1 spare), and lens magnet
Button harness
Pi0 camera flat flex cable
MIPI flat flex cable
Piezo buzzer
External cardboard box
Internal cardboard frame

In your kit

  1. 1 VisionBonnet accessory board (×1)
  2. 2 11mm plastic standoffs (×2)
  3. 3 24mm RGB arcade button and nut (×1)
  4. 4 Privacy LED (×1)
  5. 5 LED bezel (×1)
  6. 6 Tripod mounting nut (×1)
  7. 7 Lens, lens washer (1 spare), and lens magnet (×1)
  8. 8 Button harness (×1)
  9. 9 Pi0 camera flat flex cable (×1)
  10. 10 MIPI flat flex cable (×1)
  11. 11 Piezo buzzer (×1)
  12. 12 External cardboard box (×1)
  13. 13 Internal cardboard frame (×1)

Not included

Not included

  1. Raspberry Pi Zero W (with headers) (×1)
  2. Raspberry Pi camera 2 (×1)
  3. Blank SD card (at least 4 GB) (×1)
  4. Micro-USB power supply (×1)
  5. Optional: Micro-USB to USB cable (or whatever connects to your own computer) (×1)
Note: headers

The Raspberry Pi Zero W does not typically ship with headers installed. You will need to supply and solder your own 2x20 pin header or purchase a unit with them pre-soldered.

Assembly Guide

This guide shows you how to assemble the AIY Projects Vision Kit.

The kit contains a cardboard form, the VisionBonnet accessory board for image recognition, and some other connecting components. You’ll need a Raspberry Pi Zero W board, the Raspberry Pi Camera 2, and a blank SD card with at least 4 GB of space to build the kit.

By the end of this guide your Vision Kit will be assembled, configured, and ready to run!


Get the Vision Kit SD Image

You’ll need to download the Vision Kit SD image and write it to the SD card. Downloading the image can take a few minutes, so while that’s going get started on assembling the kit.

  1. Get the Vision Kit SD image

  2. After it’s downloaded, write the image to your SD card using a card writing utility ( is a popular tool for this)


Assemble the hardware

Assembly issues?

If you have any issues while building the kit, contact us at

Take your Raspberry Pi Zero W board and insert the two plastic standoffs into the two yellow holes opposite the 40-pin box header.

WARNING: First make sure your Raspberry Pi is disconnected from any power source and other components. Failure to do so may result in electric shock, serious injury, death, fire or damage to your board or other components and equipment.

Find the VisionBonnect cable connector next to the pin header. Pull up the black release lever.

Find the short MIPI flex cable and orient it with the serial number facing towards you.

On the right side, gently slide the MIPI flex cable into the VisionBonnet board until the cable hits the back of the connector.

Secure the cable with the release lever.

IMPORTANT: Ignore the white sticker that says “PI” or “VisionBonnet” on the MIPI flex cable. These may be incorrectly applied.

Now find your Raspberry Pi board and pull back the black release lever.

Slide the other side of the MIPI flex cable into the Raspberry Pi. The Raspberry Pi header should be facing toward you.

Be gentle and don’t push too hard; otherwise, you may damage the connectors.

WARNING: Failure to securely seat the ribbon in the connector may cause electric shock, short, or start a fire, and lead to serious injury, death, or damage to property.

Secure the release lever once it’s in place.

Carefully and gently bend the MIPI cable so that the board pin headers are facing each other (see the next step).

Connect the two boards via the 40-pin header. Gently push the MIPI cable so that it folds into the space between the two boards.

Press down to snap the standoffs opposite the header into place.

WARNING: Push gently: forcing the standoffs into place may cause them to break.

Find the button harness and plug it into the button connector on the VisionBonnet board.

Set your connected boards aside for now.

WARNING: Failure to securely seat the connector may cause electric shock, short, or start a fire, and lead to serious injury, death, or damage to property.

Step complete

Well done! Set aside your hardware for now.


Assemble the inner frame

Find your Pi camera 2 board and the Pi0 camera flat flex cable.

Open the cable release lever and remove the white cable that may be attached to the board. Connect the gold Pi0 cable to the Pi camera 2 board. Close the release lever once the cable is secure.

WARNING: Failure to securely seat the connector may cause electric shock, short, or start a fire, and lead to serious injury, death, or damage to property.

Find the smaller cardboard piece. This piece holds the hardware components and fits inside the larger cardboard box.

In the middle of the cardboard piece is an upside-down U-shaped cutout. Push it out, then fold the tab upward (see the image).

Place the Pi camera 2 board onto the frame. The camera’s aperture will fit into a rectangular slot in the middle of the cardboard.

Remove the protective film from the camera lens.

Turn the cardboard over.

Fold the window-shaped cardboard tab over camera board.

Fold the tabs on the left and the right of the board toward you. There will be two small cardboard notches on each side that will secure each tab.

Turn the cardboard frame over

Find your Piezo buzzer and insert the wire into the cardboard hole below the camera aperture.

Remove the adhesive cover from the Piezo buzzer.

Stick the adhesive side of the Buzzer onto the Upside-down U-shaped tab you indented earlier.

Take the other end of your Pi0 camera flat flex cable and plug it into the VisionBonnet board connector. The Pi Zero board should be facing away from you.

WARNING: Failure to securely seat the connector may cause electric shock, short, or start a fire, and lead to serious injury, death, or damage to property.

Place the two boards onto the bottom tab of the cardboard frame. Fold the rest of the cardboard frame upward.

The cardboard also has two feet under the board. Fold those downward.

Gently fold the flex cable toward the board (otherwise it won’t fit into the cardboard box later).

Loop the flex camera cable through the cardboard indent.


Put it all together

Find the other cardboard piece and fold it into a box. Keep the bottom and top open.

Take your internal cardboard frame and slide it into the bottom of the cardboard box (the bottom side has the smaller hole).

Here’s another view.

Make sure the buzzer wire and button harness didn’t get stuck.

Also check that the board connectors are aligned with the box cutouts.

WARNING: Forcing connectors into misaligned ports may result in loose or detached connectors. Loose wires in the box can cause electric shock, shorts, or start a fire, which can lead to serious injury, death or damage to property.

Take the tripod mounting nut and place it between the flaps on the bottom of the cardboard box.

Fold the large flap over to close the bottom of the box.

Find your privacy LED and LED bezel.

Push the LED bezel into the hole above the camera aperture.

From the inside of the box, take the privacy LED and insert it into the LED bezel. It should snap into place.

Make sure the LED is peeking out on the other side.

Find your arcade button and unscrew the plastic nut. Insert the arcade button into the hole on the top flap of the cardboard box. Secure it with the plastic nut on the other side.

Using too much force may result in bending or ripping the box.

Check the arcade button’s board for the words Bonnet, LED, and Piezo.

Find your Piezo buzzer cable and plug it into the slot marked Piezo.

Same for the LED.

And plug the button harness into the slot marked Bonnet.

WARNING: Failure to securely seat the connector may cause electric shock, short, or start a fire, and lead to serious injury, death, or damage to property.

Carefully close the top of the box.

Find the camera lens mount washer and remove the adhesive strip from the back.

Attach the camera lens mount washer to the cardboard.

Attach the lens assembly to the lens washer. Look through the lens to ensure lens and camera are concentrically aligned.

WARNING: The lens contains magnets and should be kept out of reach of children and may interfere with medical implants and pacemakers.

Step complete

Your Vision Kit is fully assembled. A couple more steps and you’re ready to turn it on.


Plug in your peripherals

Now that you’ve assembled your kit, plug in everything you need. Your Vision Kit can be used with or without a monitor, keyboard, and mouse. To begin, plug in both MicroUSB ports - one for power and one for data.

Insert your SD card with the Vision Kit SD image.

Plug in your micro-USB to USB cable. Connect the USB end to another computer.

Plug in the USB power supply, if you have one.

Note: Your Vision Kit will run with power from just the data cable connection, but we recommend using an external power source via the PiZero’s power MicroUSB port for reliability.

Your Vision Kit can be run without a monitor, keyboard, or mouse--the Joy Detector demo app will start automatically when the device boots up. But if you want to use the GUI and do some more tinkering, plugging in peripherals are required.


Turn it on and use the Joy Detector demo

After you plug in your power supply cable (or micro-USB connector cable) the Vision Kit will turn on. The first boot will take around 4 minutes on a Pi Zero W while the image expands, features install, and settings are configured. Subsequent boots will only take a minute.

After your Vision Kit boots up, the Joy Detector demo app will be ready to use. The Joy Detector demo runs inference using Google’s facial detection model (running on the Movidius co-processor) on frames from the PiCamera.

Try directing the PiCamera toward someone’s face: the inference run detects the number of faces and a joy score for each face. The color of the arcade button’s LED is the sum of the joy scores across all detected faces: sad faces are blue, joyful faces are yellow. And if your joy score exceeds 85%, an 8-bit sound will play. Cool!


Now What?

This is just the start. The Joy Detector demo is just one of many demos: there’s so much else you can do with the kit!

Check out our Maker’s Guide for information on:

  • Trying our other demos and models
  • SSHing / connecting to the Vision Kit through your computer
  • Starting your own Vision Kit project
  • Learning more about the TensorFlow model compiler

Happy making!



Power Supply

AIY Vision Kit is designed to work with a 5V 2A DC power supply with micro USB connector. We recommend the Raspberry Pi Power Supply ( A power supply with less than a 2A rating will cause to the system to behave improperly. Raspberry Pi Board AIY Vision Kit is designed to work with Raspberry Pi Zero W (version 1.1) and Raspberry Pi Zero (version 1.3). Please note the AIY Vision Kit requires the Raspberry Pi board to have a camera connector.

To find out which version of Raspberry Pi Zero you have, log into your system and type the following command:

cat /proc/cpuinfo | grep Revision
ID Revision
900092 Pi Zero without camera connector, no wireless
900093 Pi Zero with camera connector, no wireless
9000c1 Pi Zero W with camera connector + Wi-Fi/Bluetooth

Raspberry Pi Camera

AIY Vision Kit is designed to work with Raspberry Pi Camera Module Version 2. Please note that Camera Module Version 1 is not supported and will not work. The Raspberry Pi Camera Module Version 2 NoIR will technically work (it uses the same sensor), however our ISP is not optimized for this unit.

Raspberry Pi Camera based applications perform best when run one at a time. The AIY Vision Kit enables the Joy Demo application by default. To experiment with other Raspberry Pi Camera based applications, you must disable or stop the Joy Demo by using the following commands:

sudo systemctl stop joy_detection_demo.service
sudo systemctl disable joy_detection_demo.service

Flex Cable Connections

Be sure to check both flex cable connections. The device will not boot if the cables are connected incorrectly.

IMPORTANT: Ignore the white sticker that says “PI” or “VisionBonnet” on the MIPI flex cable. These may be incorrectly applied.

Find the silkscreen serial number on the MIPI flex cable and orient it so the numbers are facing you. With this orientation, the right side of the flex cable connects to the VisionBonnet and the left side to the Raspberry Pi board.

SD Card Image

The Vision Kit SD image is designed to work with both the Vision and the Voice Kit. It is normal to see references to the Voice Kit in the image.

If you intend to use the Vision Kit image on your Voice Kit, you need to edit boot/config.txt and uncomment this line at the bottom of the file:

dtoverlay googlevoicehat-soundcard

Boot Verification

You can run this command to check that the Vision Bonnet is connected and has booted successfully:

dmesg | grep googlevisionbonnet

If it has, you should see output similar to this:

[   18.545995] googlevisionbonnet spi0.0: Initializing
[   18.712917] googlevisionbonnet spi0.0: Resetting myriad on probe
[   18.712940] googlevisionbonnet spi0.0: Resetting myriad
[   22.843448] googlevisionbonnet spi0.0: Writing myriad firmware
[   31.573648] googlevisionbonnet spi0.0: Myriad booting
[   31.818673] googlevisionbonnet spi0.0: Myriad ready

Makers Guide

Now that you’ve explored the Joy Detector demo, you can begin to make the project your own! We’ve provided more models and demos for you to explore before beginning to train and load your own! This guide will run through the other models, demos, and how to train and compile your own TensorFlow models!


Connecting to your device

You can connect to your Vision Kit by plugging in peripherals and accessing the GUI, or by connecting through another computer using your Micro-USB data cable.

Use the GUI

You can plug in a monitor, keyboard, and mouse directly to your Vision Kit to access the desktop and open a terminal.

Connect via another computer

You can use the micro-USB connector cable to connect to your Vision Kit via SSH or console program. In your computer’s terminal, enter:

ssh pi@

The password is "raspberry". Make sure your cable supports data. Depending on the cable you are using, you may need to toggle a switch on the cable to ensure it can transfer data.

Note: The Pi’s IP address is statically set to If your host is unable to acquire an IP address via DHCP, you may need to run dhclient on your USB ethernet connection. Run ifconfig to find the correct connection and then “dhclient usb0” (or equivalent) to acquire an IP address.

Or use your console command of choice. For example:

screen /dev/ttyACM0 115200
Note: The ACM port will vary based on what is already plugged into your computer. You can verify the correct port by running dmesg.

Vision Kit options

Disable Joy Detector auto-run

The Joy Detector automatically runs when your Vision Kit is turned on. You can disable it from automatically running by entering:

systemctl stop joy_detection_demo
systemctl disable joy_detection_demo

Joy Detector code

On creation of the CameraInference object, the face detection model is sent over to the co-processor.

def _run_detector(self):
  with CameraInference(face_detection.model()) as inference:

Start_preview begins streaming from the picamera module (default settings of 1640x1232@10fps).

inference._camera.start_preview() is a generator that provides a proto response of the inference data. From this, get_faces can be used to obtain the detected faces and their attributes (including joy).

for i, result in enumerate(
  faces = face_detection.get_faces(result)
  # Calculate joy score as an average for all detected faces.
  joy_score = 0.0
  if faces:
    joy_score = sum([face.joy_score for face in faces]) / len(faces)

  # Append new joy score to the window and calculate mean value.
  self.joy_score = sum(self._joy_score_window) / len(self._joy_score_window)
  if self._num_frames == i or not self._run_event.is_set():

Try our other demos

To get you started, we have five additional demo models, but many more are coming soon! Check the sections below for instructions on how to run them.

The model files are included on your SD card image at ~/models.

And the demos are located at ~/AIY-projects-python/src/examples/vision.

Model Description Demo
face_detection.binaryproto Detects faces from image with bounding boxes, also gives joy score for each face detected
mobilenet_ssd_256res_0.125_person_cat_dog.binaryproto MobileNet, detection of a person/cat/dog in an image. Provides a bounding box.
mobilenet_v1_160res_0.5_imagenet.binaryproto MobileNet, used for recognizing objects within an image (no bounding box provided).
squeezenet_160res_5x5_0.75.binaryproto Based on SqueezeNet. Used for recognizing objects within an image (no bounding box provided).
mobilenet_v1_192res_1.0_seefood.binaryproto MobileNet based. Recognizes 2023 types of classic dishes from 14 different cuisines

Note: If you are using a version of the SD image older than 2018-02-21, you will need to activate our virtualenv before beginning any demo. If you are using the most recent SD image, this is not necessary.

source ~/AIY-projects-python/env/bin/activate

3.1 Run the Face Detector

There are two variants of the face detector, one using camera frames ( and another that takes an image and places bounding boxes on it (

To run the camera frames variant, enter the following in your terminal and include the number of frames you want the camera to process:

~/AIY-projects-python/src/examples/vision/ --num_frames=<number of frames>

The camera variant will process camera frames and print the number of faces detected to your console. Note that if you do not pass the --num_faces parameter, the demo will keep going forever!

To run face detection on a single image, find a saved image on your computer or on the SD card image and enter the following (you’ll also need to specify a place to save the output image):

~/AIY-projects-python/src/examples/vision/ --input=<input image> --output=<output image>

Face detection on a single image will overlay bounding boxes of face on the output image as well as print faces detected to the console.

3.2 Run the Object Detector

The object detector looks at objects and detects whether its a person, cat, or a dog. It prints its answer to the console and prints an output image with bounding boxes on the detected object.

To run it, enter:

~/AIY-projects-python/src/examples/vision/ --input=<input image> --output=<output image>

Remember that you’ll need to specify a path to save the output image.

3.3 Run the Image Classifier

The image classifier will look at an image

~/AIY-projects-python/src/examples/vision/ --input=<input image>

Image classification will print the resulting label as well as the probability score for the match.

We provided an alternative model for image classification task based on SqueezeNet. To try it, replace this line in

model_type = image_classification.MOBILENET


model_type = image_classification.SQUEEZENET

Then run

~/AIY-projects-python/src/examples/vision/ --input=<input image>

3.4 Run the Dish Classifier

The dish classifier will look at an image and tells what dishes an image contains. It will print the resulting label as well as the probability score for the match.

~/AIY-projects-python/src/examples/vision/ --input=<input image>

Going beyond

Now that you’ve got a taste what the Vision Bonnet can enable - we’d love to see what else you can do! We’ve included HW, APIs, and tools to enable you to get your own smart vision project up and running.

4.1 TensorFlow Model Compiler

Vision Kit allows you to run a customized model on device. First, you must specify and train your model in Tensorflow. Learn more about TensorFlow.

To get started on deploying your model on Vision Kit, export the model as a frozen graph. You then compile the frozen graph using bonnet_model_compiler (licensed under Apache 2.0) into a binary file that can be loaded and run on Vision Bonnet.

NOTE: The compiler works only with x86 64 CPU running Linux. It was tested with Ubuntu 14.04. You should NOT run it on VisionBonnet.

To unzip the file, do tar -zxvf bonnet_model_compiler_yyyy_mm_dd.tgz. This should give you bonnet_model_compiler.par (you might need to chmod u+x bonnet_model_compiler.par after downloading).

Due to limited hardware resource on Vision Bonnet, there are constraints on what type of models can run on device, detailed in the Constraints section below. The bonnet_model_compiler.par program will perform checks to make sure your customized model can run on device.

Note: You can run this tool as soon as you get a frozen graph, even before the training has converged at all, to make sure your model can run on VisionBonnet. You can use the checkpoint generated at training step 0 or export a dummy model with random weights after defining your model in TensorFlow. In addition, it's highly recommended that you run the compiled binary on Bonnet as well to make sure it returns a result.
  1. Model takes square RGB image and input image size must be a multiple of 8.

    Note: Vision Bonnet handles down-scaling, therefore, when doing inference, you can upload image that is larger than model's input image size. And inference image's size does not need to be a multiple of 8.

  2. Model's first operator must be tf.nn.conv2d.

  3. Model should be trained in NHWC order.

  4. Model's structure should be acyclic.

  5. When running inference, batch size is always 1.

Supported operators and configurations

The following subset of tensorflow operators can be processed by the model compiler and run on device. There are additional constraints on the inputs and parameters of some of these ops, imposed by the need for these ops to run efficiently on the Vision Bonnet processor.

TF operators Supported on device configuration
tf.nn.conv2d Input tensor depth must be divisible by 8 unless it is the first operator of the model.

filter: [k, k, in_channels, out_channels], k = 1, 2, 3, 4, 5;

strides: [1, s, s, 1], s = 1, 2;

padding: VALID or SAME;

data_format: NHWC;
tf.nn.depthwise_conv2d filter: [k, k, in_channels, channel_multiplier], k = 3, 5, channel_multiplier = 1;

strides: [1, s, s, 1], s = 1, 2;

padding: VALID or SAME;

data_format: NHWC;
tf.nn.max_pool Input tensor depth must be divisible by 8.

ksize: [1, k, k, 1], k = 2, 3, 4, 5, 6, 7;

strides: [1, s, s, 1], s <= k;

padding: VALID or SAME;

data_format: NHWC
tf.nn.avg_pool ksize: [1, k, k, 1], k = 2, 3, 4, 5, 6, 7;

strides: [1, s, s, 1], s <= k;

padding: VALID or SAME;

data_format: NHWC
tf.matmul Suppose a is MxK matrix, b is KxN matrix, K must be a multiple of 8.

a: rank-1 or rank-2 tensor;

b: rank-1 or rank-2 tensor;

transpose_a: False;

transpose_b: False;

adjoint_a: False;

adjoint_b: False;

a_is_sparse : False;

b_is_sparse: False;
tf.concat axis: 1, or 2, or 3
tf.add Yes
tf.multiply Yes
tf.nn.softmax dim: -1.
tf.sigmoid x: tensor's shape must be [1, 1, 1, k].
tf.nn.l2_normalize Input tensor depth must by a multiple of 8.

dim: -1.
tf.nn.relu Yes
tf.nn.relu6 Yes
tf.tanh Yes
tf.reshape First dimension tensor can not be reshaped. That is shape[0] = tensor.shape[0].
Supported Common Graphs
Name Configurations
Mobilenet input size: 160x160, depth multiplier = 0.5
SqueezeNet input size: 160x160, depth multiplier = 0.75
How to run the compiler

To run the compiler:

./bonnet_model_compiler.par -- \
    --frozen_graph_path=<frozen_graph_path> \
    --output_graph_path=<output_graph_path> \
    --input_tensor_name=<input_tensor_name> \
    --output_tensor_names=<output_tensor_names> \

Take mobilenet_v1_160res_0.5_imagenet.pb (available after download) as an example. Put mobilenet_v1_160res_0.5_imagenet.pb in the same folder as bonnet_model_compiler.par.

./bonnet_model_compiler.par -- \
    --frozen_graph_path=./mobilenet_v1_160res_0.5_imagenet.pb \
    --output_graph_path=./mobilenet_v1_160res_0.5_imagenet.binaryproto \
    --input_tensor_name="input" \
    --output_tensor_names="MobilenetV1/Predictions/Softmax" \

VisionBonnet model binary ./mobilenet_v1_160res_0.5_imagenet.binaryproto generated.

2 GPIO Diagrams

GPIO Diagram Front

GPIO Diagram Back

3 AIY Microcontroller

The Vision Bonnet is the first AIY project to include a dedicated AIY microcontroller. The MCU adds features that the Raspberry Pi alone doesn’t provide. These GPIOs are accessible via 1mm pitch pins on the top of the Vision Bonnet (connector P2). The MCU enables:

  • PWM support for servo/motor control without taxing the PiZero CPU.
  • Control of the two LEDs on the bonnet.
  • More accurate analog channels than the PiZero.
  • Frees up Pi GPIOs for other uses.

Control of the GPIOs is available in user-space code via sysfs nodes or from Python code with the included adaption ( of the popular gpiozero library. The package provides gpiozero compatible pin specifications for the additional io functionality on the hat. These definitions can be used to construct the standard gpiozero devices like LEDs, Servos, and Buttons. Examples can be found in ~/AIY-projects-python/src/examples/vision/gpiozero/.

Included UX Elements

The vision kit runs all of it’s intelligence on device, and we want to make sure your project can work without needing to interact with a console. To help, we’ve included the following UX elements:

  • Arcade-style Push Button
  • RGB LED (integrated into the push button)
  • Front Privacy LED
  • Piezo Buzzer

Project complete!

You did it! Whether this was your first hackable project or you’re a seasoned maker, we hope this project has sparked new ideas for you. Keep tinkering, there’s more to come.