Maker Kit

The ultimate DIY AI machine. Use machine learning to make projects with accelerated vision and audio intelligence.

Introduction

The AIY Maker Kit is a quickstart manual for embedded artificial intelligence (AI) projects.

Using a Raspberry Pi and a few accessories, our tutorials show you how to build a portable device that can identify objects, people, and body poses with a camera, and recognize voice commands or other sounds with a microphone.

You'll use machine learning (ML) to create new TensorFlow Lite models that recognize your own objects and respond to your own speech commands. And you'll learn to write code that reacts to these inputs to create a variety of smart applications.

You don't need to know anything about machine learning and you don't need to be a skilled programmer, but you should know your way around a computer. We'll provide step-by-step instructions to run some code examples and make a few code changes, and then you can decide how much more code you want to write to build your own projects. (We provide some help and ideas toward that as well.)

So let's get started!

Locate objects

Identify body parts

Recognize speech

List of materials

The Maker Kit hardware is based on a Raspberry Pi computer, and to execute advanced ML models at high speeds, we use the Coral USB Accelerator. This USB accessory includes a Coral Edge TPU, which is a computer chip that's specially designed to run machine learning (ML) models really fast—it's like an ML turbocharger for your Raspberry Pi. Then you just need the eyes and ears—a Pi Camera and USB microphone—to build your smart projects.

1
Raspberry Pi Camera
2
Raspberry Pi 4 (or Pi 3)
3
Coral USB Accelerator
4
USB-C cable (included w/ Coral)
5
microSD card (8 GB or larger)
6
5V/3A power adapter
(USB-C for Pi 4, or Micro-USB for Pi 3)
7
USB microphone

Required hardware

  1. 1 Raspberry Pi Camera (×1)
  2. 2 Raspberry Pi 4 (or Pi 3) (×1)
  3. 3 Coral USB Accelerator (×1)
  4. 4 USB-C cable (included w/ Coral) (×1)
  5. 5 microSD card (8 GB or larger) (×1)
  6. 6 5V/3A power adapter
    (USB-C for Pi 4, or Micro-USB for Pi 3)
    (×1)
  7. 7 USB microphone (×1)

Get the hardware

All the parts listed above are available from a variety of electronics distributors. If you need all the parts or just some of them, you can find them at the online shops below.

You also need a personal computer that's connected to Wi-Fi. You can use a Windows, Mac, or Linux computer. Our instructions include steps for Windows and Mac only, but if you're a Linux user, you should be able to follow along just fine.
Hardware exceptions: You can use a USB camera instead of the Pi Camera (it just won't look as neat in the cases below). You can also use a Raspberry Pi Zero 2, instead of a Pi 3 or Pi 4, but you'll need an extra USB adapter for the accelerator and microphone.

Optional materials

The AIY Maker Kit isn't just about making AI, it's also about being a maker!A maker is a person who builds stuff. Often, a maker will creatively combine technology with hardware to solve problems, provide entertainment, create art, or just to explore technology. So it's up to you to create a case for your kit.

We offer two DIY options, but neither are required and you can create a case that's completely different. If you build your own case, just make sure the Raspberry Pi and USB Accelerator have room to breathe because they can get hot.

1
Case top
2
Case base
3
Glue stick
4
Craft knife
5
Scissors
6
Cardstock or cereal box cardboard (10" x 6")
7
Box from the Coral USB Accelerator

3D-printed case

  1. 1 Case top (×1)
  2. 2 Case base (×1)

Cardboard case

  1. 3 Glue stick (×1)
  2. 4 Craft knife (×1)
  3. 5 Scissors (×1)
  4. 6 Cardstock or cereal box cardboard (10" x 6") (×1)
  5. 7 Box from the Coral USB Accelerator (×1)

Create a case

Here are the design files for our DIY cases.

3D-printed case

This 3D-printed case snaps together with two parts. The top includes a mount for the Pi Camera, and you can re-orient the camera by printing a separate camera enclosure (not pictured). The design includes variants for Raspberry Pi 4 and Raspberry Pi 3.

If you don't own a 3D printer, you can still get this case:

  • Search the internet for "local Makerspace" where they might have a 3D printer you can use.
  • Find an online "3D printing services" that will print the parts and mail them to you.

The following ZIP file includes all the STL files for printing, plus STEP files so you can remix the design. For more information, see the included README files.

Note: Begin printing the case or place your order for one and then continue with the setup guide. You can put the hardware into your case later on.

Cardboard case

This cardboard case holds everything together and can be assembled in less than an hour. The case is built using a small sheet of cardstock (or other thin cardboard such as from a cereal box) and the box in which your Coral USB Accelerator was packaged.

The following PDF files include all the assembly instructions and the templates you need to print for construction. Be sure you download the appropriate PDF format for your printer paper.

Note: Don't begin assembling the cardboard case until you begin flashing the SD card.
Beware: Unfortunately, if your Coral USB Accelerator included a black USB cable (instead of a white USB cable), it will not fit into this cardboard box design.

Setup guide

The following instructions show how to flash your microSD card with our AIY Maker Kit system image and assemble your case. The Maker Kit system image includes Raspberry Pi OS with all software and settings that enable the ML projects that follow.

Before you start, double check that you have all the required materials. It's okay if you don't have your case yet; you can start programming now and put that together later.

Note: If you already have a Raspberry Pi system, we suggest you follow these instructions to re-flash your microSD card (or flash a different SD card), because it will avoid many possible conflicts with the software dependencies. However, if you insist on using an existing system, we do offer a guide to upgrade an existing Raspberry Pi system (for experienced Raspberry Pi users only).

1. Flash the SD card

Get the SD card image

Download the Maker Kit system image (based on Raspberry Pi OS):

Download aiy-maker-kit-2022-05-18.img.xz

Install the flashing software

  1. Download the Raspberry Pi Imager from here.

    The web page detects your operating system, so just click the blue Download button.

  2. Install the Raspberry Pi Imager app.

    • On Windows, launch the .exe file and follow the setup wizard.
    • On Mac, launch the .dmg file and drag the Raspberry Pi Imager into the Applications folder.

Note: You must have Raspberry Pi Imager v1.7 or higher.

Connect the SD card to your computer

Use an SD card adapter (if necessary) to connect your microSD card to your Windows, Mac, or Linux computer.

Select the SD card and system image

  1. Launch Raspberry Pi Imager.
  2. Click Choose OS, scroll to the bottom, and select Use custom.
  3. Select the Maker Kit system image (the .img.xz file you downloaded above).
  4. Click Choose storage and select your microSD card (it might be named something like "Generic MassStorageClass Media").
  5. Do not click Write yet.

Configure advanced settings

In order to log into the Raspberry Pi wirelessly, you must specify the Wi-Fi details for the system image:

  1. Click the gear icon in Raspberry Pi Imager to open the Advanced options.
  2. Scroll down a little, select Configure wireless LAN, and then enter your Wi-Fi name (SSID) and password. Beware that the SSID name is case sensitive (capitalization matters).
  3. Be sure that Wireless LAN country is set to your current location.
  4. Triple-check that your Wi-Fi name and password are correct, and then click Save.
  5. Now click Write to begin flashing the SD card.

This can take several minutes. So while that’s working, proceed to assemble your case.

Need more help? Watch this video about how to use the Raspberry Pi Imager.

2. Assemble your case

The following instructions are for the 3D-printed case. If you don't have your case yet, that's okay, just connect the hardware as described below (connect the camera and USB Accelerator) and you can put it in the case later.

If you're building the cardboard case, instead follow the instructions in the PDF download and when you're done, skip to turn it on.

 

Open the camera latch on the Raspberry Pi

Between the HDMI ports and the headphone jack, you'll find the camera cable latch.

Pull up on the black latch to open it.

Insert the flex cable and close the latch

Grab the end of the camera flex cable (it should already be attached to the Pi Camera) and insert it so the blue strip is facing toward the black latch.

Be sure the cable is inserted fully, and then push down on the latch to lock the cable in place.

Insert the Raspberry Pi

Start by inserting the side of the board with the headphone jack so those plugs rest in the corresponding holes.

Then press down on the opposite side of the board (with the 40-pin header) so it snaps into place. You can gently pull outward on the case wall so it snaps in easier.

Make sure it's inserted fully: The Ethernet port should rest flat on the case opening, and the SD card slot on the opposite side should be accessible.

Note: The SD card should not be in the board yet—it can make inserting the board more difficult.

Attach the Camera

First set the camera's top mounting holes onto the matching pegs, and hold them firmly in place.

Be sure the holes on the camera board are aligned with the posts, and then press firmly on the cable connector to snap the camera into place. You might need to press hard.

Ensure that both side clips are now holding the board and the camera lens is set in the hole.

Fold the flex cable

Press the cable against the roof of the case and then fold it backward where it meets the wall, creasing the cable.

Then fold the cable again at a 90-degree angle after it passes over the camera, creasing it so the cable turns toward the outer USB cutout, as shown in the photo.

Don't worry, you will not damage the cable by creasing it.

Set the cable inside

As you close the lid, the flex cable should turn backward one more time, right behind the USB ports. You do not need to crease the cable here.

Snap the lid shut

First set the clips on the side near the board's GPIO header.

Then set the opposite side by pressing gently on the side with the plug openings, while pressing down on the lid.

Attach the USB Accelerator

Start by setting the top corners of the USB Accelerator into the case, so the USB plug is on the same inside as the microSD port on the Raspberry Pi.

Then press down to snap it in.

Connect the USB cable

Connect the USB cable to one of the blue ports on the Raspberry Pi, and press the cable into either of the two cable guides.

The blue ports support USB 3.0, which provides much faster data-transfer speeds than USB 2.0.

Leave the USB Accelerator unplugged so you can access the microSD slot.

Attach the USB microphone

If you plan on using audio classification (speech recognition), then attach your USB microphone to any of the available USB ports.

Note: The headphone jack on the Raspberry Pi does not support input, so you must use a USB microphone (unless you're also using audio HAT with a mic input).

3. Turn it on

Insert the SD card

  1. Wait until your SD card is done flashing. Raspberry Pi Imager will show a dialog that says Write Successful and it automatically "ejects" the card.
  2. Remove the SD card from your computer.
  3. Locate the SD card slot on the bottom of the Raspberry Pi and insert the microSD card (the card label faces away from the board).
  4. If you're using the 3D-printed case, you can now plug the USB cable into the USB Accelerator.

Power on the Raspberry Pi

Plug in your USB power adapter to the board, and connect it to your wall power supply.

Note: You do not need a monitor, keyboard, or mouse attached to the Raspberry Pi. You will connect to the board over Wi-Fi.

Let it boot up

Make sure the red LED next to the SD card slot is illuminated. The green LED also flickers during boot-up.

Allow about two minutes for the system to boot up for the first time. When it's ready, the green LED mostly stops blinking, while the red LED remains solid. (Subsequent boots are much faster.)

Warning! The Coral USB Accelerator can become very hot after extended use. Always avoid touching the metal surface during operation.

Caution: The Raspberry Pi OS is running directly from the microSD card, so you should never remove the card while the board is powered on. Doing so can corrupt the card data.

4. Log into the Pi with VNC

Install VNC viewer

  1. Download the VNC Viewer from here.

    The web page detects your operating system, so just click the blue Download VNC Viewer button.

  2. Install the VNC Viewer app.

    • On Windows, launch the .exe file and follow the setup wizard (we recommend you leave any install settings as their defaults).
    • On Mac, launch the .dmg file and drag the VNC Viewer into the Applications folder.

Launch VNC Viewer

On Windows:
  1. Press the Windows-logo key to open Windows search and type "vnc".
  2. Click VNC Viewer to launch the app.
On Mac:
  1. Press Command + Spacebar to open Spotlight search and type "vnc".
  2. Click VNC Viewer to launch the app.

At the startup screen, click Got it to continue.

Find the Raspberry Pi's IP address

Your Raspberry Pi should already be booted up and connected to your Wi-Fi network (because we specified the Wi-Fi login during the flash procedure). So we now need to get its IP address.

On Windows:

Open a Command PromptThis is a Windows app that allows you to issue command-line instructions. To open it, press the Windows-logo key to open search, type "command prompt" and then press Enter to open the app. and run this command:

ping raspberrypi -4 -n 1
Copy

The first line in the response should include the IP address in brackets. Keep this IP address visible for the next step.

On Mac:

Open a TerminalThis is an app that allows you to issue command-line instructions. To open it, press Command + Spacebar to open search, type "terminal" and then click Terminal.app. and run this command:

ping raspberrypi -c 1
Copy

The first line in the response should include the IP address in parentheses. Keep this IP address visible for the next step.

Help! If this does not print the IP address, try the following:

  • Wait a couple minutes in case the Pi is not fully booted up. If the ping command still fails, unplug the Raspberry Pi and plug it in again, and then wait a minute to try again.
  • Connect an HDMI monitor to your Raspberry Pi. On the Raspberry Pi desktop, you should see a file with your IP address as its name—write it down. If the filename is "Network not found," the Raspberry Pi could not connect to your Wi-Fi, so reflash the SD card and verify the Wi-Fi name and password in the advanced options.
  • If you're still unable to connect over Wi-Fi, connect an Ethernet cable between your Raspberry Pi and PC and run the ping command again. You'll need to keep the cable connected while using VNC.

Enter the IP address

Type the IP address in the text box at the top of the VNC Viewer window.

VNC Viewer will save this device address. So next time you want to connect, you can just double-click on the device that appears in VNC Viewer.

Note: After you shut down your Raspberry Pi for a while, it's possible the Raspberry Pi's IP address will change when you turn it back on. So if VNC is unable to connect to your saved device, repeat the previous step to find the IP address. Then right-click on the device in VNC Viewer, select Properties, and update the IP address on that screen.

Log in using the default password

You should then see a login dialog where you can use the following login information:

Username: pi
Password: raspberry

Click OK, and in moment you'll see the Raspberry Pi desktop appear.

Note: You might need to resize the VNC Viewer so you can see the whole Raspberry Pi desktop. (Because we did not connect a monitor to the Raspberry Pi, it uses a fixed window size. You can customize the resolution later by running sudo raspi-config.)

Complete the Pi OS setup

The desktop might show two dialogs on top of each other. If the dialog on top is a warning to change your password, click OK to dismiss it (you'll change the password next).

The other dialog says "Welcome to Raspberry Pi" and helps you configure some settings:

  1. Click Next and follow the wizard to set your country, language, and timezone.
  2. You'll then see a page where you can change the password, so enter a new password and click Next.
  3. Ignore the page to set up your screen and click Next.
  4. Also ignore the page about WiFi because you're already online, so click Skip.
  5. When it offers to update your software, click Skip because it's not necessary and can take some time.

Now you're ready to start developing with ML!

Note: If you want to change your password again later, open the Terminal and run the passwd command.

How to shut down

You just got started, so don't shut down yet, but when you decide you're done for the day, it's important that you properly shut down the system before unplugging the power.

You can shut down the system in two ways:

  • When viewing the Raspberry Pi OS desktop, open the applications menu (the Raspberry Pi icon in the top-left corner of the desktop) and click Shutdown.
  • From the Raspberry Pi terminal, run the following command:

      sudo shutdown
    Copy

Wait for the green LED to stop blinking (the red LED stays on even when the operating system is shut down), and then unplug the power.

Caution: Failure to properly shut down the system before unplugging the power could corrupt the SD card data.

Project tutorials

Now the real fun begins! Let's build some Python projects with machine learning models!

To make these projects easier, we created some helpful Python APIs in the aiymakerkit module. These APIs perform the complicated (and boring) tasks that are required to execute an ML model. Our APIs also perform common tasks such as capturing images from a camera, drawing labels and boxes on the images, and capturing audio from a microphone. This way, you need to write a lot less code and can focus on the features that make your projects unique.

So let's get started!

Note: Basic programming experience is required to complete these tutorials. Each tutorial begins with a fully functional Python script that you can run from the terminal. To expand on these scripts, the tutorials provide some new code you can copy-paste into the files. That means you need to know just the basics about Python code syntax, such as line indentations, so you know how to insert the code.

Verify your setup

All the software you need is already installed on the system image you flashed to your SD card. So let's run a quick test to make sure your hardware is connected and working as expected.

  1. Open the TerminalThe Terminal is an app that allows you to issue command-line instructions for the computer. On the Raspberry Pi desktop, the Terminal icon appears in the task bar at the top (it looks like a black rectangle). Click it to open a terminal window. on your Raspberry Pi. By default, the terminal opens to your Home directory and the prompt looks like this:

    pi@raspberrypi:~ $
    

    pi is the username and raspberrypi is the device hostname. These are the default names, but perhaps yours are different if you changed them in the Raspberry Pi Imager options.

    The squiggly line (called a tilde) represents the Home directory, which is currently where the command prompt is located. (You can run pwd to see the full path name for your current directory.)

  2. Change your directory to the aiy-maker-kit folder by typing this command and then pressing Enter:

    cd aiy-maker-kit
    
    Copy

  3. Then run our test script with this command:

    python3 run_tests.py
    
    Copy

    This runs some tests to be sure the required hardware is detected. If all the tests succeeded, the terminal says "Everything looks good." If something went wrong, you should see a message with a suggestion to solve the problem, so take any necessary steps and try again.

Now you're ready to build some projects with ML!

But before you continue, change your directory again to the examples folder where we'll get started:

cd examples
Copy

Build a face-detecting smart camera

Objective:

  • Create an app that takes a photo when all visible faces are centered in the frame.

What you'll learn:

  • How to continuously capture frames from the camera.
  • How to feed each frame to an object detection model.
  • How to check if a detected object is inside a rectanglar region.

This project builds upon the detect_faces.py example, so let's start by running that code:

  1. Your terminal should already be in the examples directory, so just run this command:

    python3 detect_faces.py
    
    Copy

    Make sure the camera is pointed toward you and your face is well lit (point the camera away from any windows/lights).

    When the video appears on the screen, you should see a box drawn around your face. Also try it with multiple faces. If there's nobody else around, you can hold up a photo instead.

    Note: If you see a warning that says, "Locale not supported by C library," you can ignore it. It's harmless and related to your locale change during setup, so it will go away after a reboot.

  2. Press Q to quit the demo. (If it does not close, click on the camera window so it has focus and then press Q again.)

  3. To start building the smart camera, make a copy of detect_faces.py with a new name that will be your project file. Here's a quick way to make a copy from the terminal:

    cp detect_faces.py smart_camera.py
    
    Copy

  4. Now open the new file for editing in ThonnyThonny is an integrated development environment (IDE) for Python programming—basically a fancy text editor that specializes in Python code. (learn more about Thonny):

    thonny smart_camera.py &>/dev/null &
    
    Copy

Next, we'll start writing some code to build the smart camera.

Note: If you think the above thonny command looks weird, you're right, but those extra instructions are helpfulWhen you launch an app like Thonny from the command line, by default, it runs in the "foreground" of the terminal, which means you cannot issue new commands until that app quits (try running just thonny smart_camera.py to see for yourself). So, by adding &>/dev/null & at the end, we specify that any terminal output should be discarded into the "null" file (it's ignored) and the app process should run in the "background" so we can get the command prompt back.. Alternatively, you can run xdg-open . (include the period) to open the current directory in the File Manager, and then you can double-click the smart_camera.py file to open it in Thonny.

Build a person-detecting security camera

Objective:

  • Create an app that detects when a person enters a forbidden area.

What you'll learn:

  • How to read a labels file and filter results so you respond to only certain objects.
  • How to detect a specific amount of overlap between bounding boxes.

This project has some similarity with the smart camera because you'll inspect the location of an object. But this project expands on the concepts you already learned because it teaches you how to handle a model that can recognize lots of different things (not just faces) and you'll perform more sophisticated bounding-box comparisons.

First, try the object detection demo that we'll use as our starting point:

python3 detect_objects.py
Copy

See if it can correctly label your keyboard, cell phone, or scissors.

Remember that each ML model can recognize only what it was trained to recognize. Whereas the previous face detection model could recognize only faces, this model can recognize about 90 different things, such as a keyboard, an apple, or an airplane. To see what objects it was trained to recognize, open the coco_labels.txt file in aiy-maker-kit/examples/models/.

Play with this for a while by showing it different objects that are listed in the labels file (try searching the web for images on a phone or computer, and hold that up to the camera). You'll probably notice the model is better at recognizing some things more than others. This variation in accuracy is due to the quality of the model training. But one thing the model seems to detect consistently well is people.

So let's build a program that responds when a person appears in the image and detect when they only partially enter an area.

To get started, make a copy of detect_objects.py and open it:

cp detect_objects.py security_camera.py
Copy
thonny security_camera.py &>/dev/null &
Copy

Build a custom image classifier

Objective:

  • Train an image classification model to recognize new objects (all on the Raspberry Pi).

What you'll learn:

  • How to create your own image dataset and labels file to train an ML model.
  • How to train an image classification model on the Raspberry Pi.

The object detection model used in the previous project was trained to recognize only a specific collection of objects (such as people, bicycles, and cars)—you can see all the objects it's trained to recognize in the coco_labels.txt file (about 90 objects). If you want to recognize something that's not in this file, then the model won't work, you would have to train a new version of the model using sample images of the object you want it to recognize.

However, training an object detection model (as used for face and person detection) requires a lot of work to prepare the training data because the model not only identifies what something is, but where it is (that's how we draw a box around each person). So to train this sort of model, you need to provide not only lots of sample images, but you must also annotate every image with the coordinates of the objects you want the model to recognize.

Instead, it's much easier to train an image classification model, which simply detects the existence of certain objects, without providing the coordinates (you cannot draw a bounding box around the object). This type of model is useful for lots of situations because it's not always necessary to know the location of an object. For example, an image classification model is great if you want to build a sorting machine that identifies one item at a time as they pass the camera (perhaps you want to separate marshmallows from cereal).

There are several ways you can train an image classification model that works with the Coral USB Accelerator, and we've linked to some options below. However, our favorite (and the fastest) method is to perform transfer learningTraining an ML model ordinarily takes a long time and requires a huge dataset. However, transfer learning allows you to start with a model that's already trained for a related task and then perform further training to teach the model new classifications but with a much smaller training dataset. on the Raspberry Pi, using images collected with the Pi Camera.

So that's what we're going to do in this project: collect some images of objects you want your model to recognize, and then train the model right on the Raspberry Pi, and immediately try out the new model.

Build a pose classifier

Objective:

  • Train a pose classification model to recognize your body poses.

What you'll learn:

  • The difference between pose detection and pose classification.
  • How to collect a large image dataset.
  • How to train a TensorFlow Lite model using Google Colab.

If you finished the project above to train an image classification model then you've taken your first big step into the world of machine learning, because you actually performed the "learning" part of machine learning (usually called "training"). In this next project, we're going to continue with model training, but things are going to get more interesting (and a bit more complicated).

In this project, we're going to train a brand new model that can recognize a variety of different poses that you make with your body. The basis for our pose classification model is the output from a pose detection model (also called a pose estimation model), which is trained to identify 17 body landmarks (also called "keypoints") in an image, such as the wrists, elbows, and shoulders, as illustrated below.

Photo credit: imagenavi / Getty Images

This pose detection model is already trained and we won't change it, but it only identifies the location of specific points on the human body. It doesn't know anything about what sort of pose the person is actually making. So we're going to train a separate model that can actually label various poses (such as the name of yoga poses) based on the keypoints provided by the pose detection model.

To get started, try the pose detection model:

python3 detect_poses.py
Copy

Orient your camera and step away so it can see all or most of your body. Once it's running, you should see a sort of stick figure drawn on your body.

To understand this a little better, open the detect_poses.py file and update it so it prints the pose data returned by get_pose():

for frame in vision.get_frames():
    pose = pose_detector.get_pose(frame)
    vision.draw_pose(frame, pose)
    print(pose, '\n')
Copy

Now when you run the code, it prints a large array of numbers with each inference, which specifies the estimated location for 17 keypoints on the body. For each keypoint there are 3 numbers: The y-axis and x-axis coordinates for the keypoint, and the prediction score (the model's confidence regarding that keypoint location). The draw_pose() function simply uses that data to draw each point on the body (if the prediction score is considered good enough).

This is very cool on its own, but we want to create a program that actually recognizes the name of each pose we make.

To do that, we'll train a separate model that learns to identify particular patterns in the keypoint data as distinct poses. For example, the model might learn that if both wrist keypoints are above the nose and elbow coordinates, that is a "Y" pose. So, this model will be trained, not with images from the camera, but with the keypoint data that's output from the pose detection model above.

So let's start by collecting images of these poses…

Build a voice-activated camera

Objective:

  • Create an app that takes a picture in response to a voice command of your choice.

What you'll learn:

  • How to train a speech recognition model with Teachable Machine.
  • How to take a picture using the picamera API.

In this project, we'll train an audio classification model to recognize new speech commands. There are lots of things you can build that respond to voice comamnds, but we'll start with a voice-activated camera.

Before we begin, let's quickly discuss what goes into and comes out of the audio model. For comparison, the input for the previous image ML models is simple: they take an image. Even when using an object detection model with video, the model takes just one image at a time and performs object detection, once for every frame of video. But audio is a time-based sequence of sounds, so you might be wondering how we put that kind of data into the model.

First of all, how an audio model receives input and provides results depends on the type of task you want to perform. For example, a model that performs voice transcription (speech to text) is very different from a model that recognizes specific words or phrases (such as "Hey Google"). Our model is the second type; it will recognize only specific words or phrases. In fact, it can recognize only speech commands or sounds that fit within one second of time. That's because the input for our model is a short recording: the program listens to the microphone and continuously sends the model one-second clips of audio.

What later happens inside our speech recognizer model is similar to what happens in one of the image classification models, because the one-second audio clip is first converted into an image. Using a process called fast fourier transformation, the model creates a spectrogram image, which provides a detailed image representation of the audio signal in time. Though you don't need to understand how this works, it's useful to simply know that the audio model actually learns to recognize different sounds by recognizing differences in the spectrogram image. Then the model outputs a classification (such as "on" or "off") with a confidence score (just like an image classifier or object detector).

So, to create our own speech recognizer, we need to record a collection of one-second samples for each word or phrase we want to teach the model. When the training is done, our program will repeatedly feed one-second sound recordings to the model and the model will do its best job to predict whether we said one of the learned commands.

Advanced projects

Now that you've learned a bit about what's possible using machine learning, what else can you create?

Discover more APIs

Before you start a new project, it's a good idea to get familiar with the available tools. The project tutorials above introduce several important features from the aiymakerkit API, but there are still more APIs available.

To get started, read through the aiymakerkit API reference (it's not very long). We didn't use all these APIs in the projects above, so you might discover some APIs that will help you with your next project. Plus, some of the functions we already used support more parameters that we didn't use. For example, the draw_objects() function allows you to specify the line color and line thickness for the bounding boxes.

Explore the source code

This is optional. There's a lot more you can learn by looking at the aiymakerkit source code, but if you're not in the mood to read lots of code, you can skip this section.

On your Raspberry Pi, you might have noticed there's a lot more in the aiy-maker-kit/ directory than just example code. This directory also includes the source code for the aiymakerkit Python module (which is what we import at the top of all the examples).

So if you're curious how these APIs work (are you curious how draw_pose() actually draws the keypoints on your body?), just open the aiy-maker-kit/aiymakerkit/ directory and view the files inside. You'll notice that the aiymakerkit API depends on a variety of other APIs, such as PyCoral to perform ML inferencing, OpenCV for video capture and drawing, and PyAudio for audio capture.

These other API libraries offer a wealth of capabilities not provided by aiymakerkit, so you might want to explore those libraries as well to find other APIs you can use in your project.

And, if you see something you'd like to change or add to the aiymakerkit API, just edit the code right there. To then use the changes in your project, just reinstall the aiymakerkit package:

python3 -m pip install -e ~/aiy-maker-kit
Copy

The -e option makes the installation "editiable." That means if you continue to change the aiymakerkit source code, any projects that import these modules automatically receive those change—you won't need to reinstall aiymakerkit again.

Finally, if you want to explore the aiymakerkit code online, you can see it all on GitHub.

Vision projects

The number of projects you can build using vision intelligence (such as image classification, object detection, and pose detection) is endless. So here are just a few ideas to get you going:

Audio projects

So far, we've created just one project using audio from a microphone, but there are so many things you can create with a simple speech or sound classification model.

Here are some other project ideas:

Run your app at bootup

When you're done developing your project and ready to put your Maker Kit to work, you might want your Python program to automatically start when the Raspbery Pi boots up. That way, if the board loses power or you need to reset for any reason, all you need to do is power it up and your program starts without requiring you to connect to the desktop or a remote shell.

You can make this happen by creating a systemd service, which is defined in a .service file.

As an example, we included the following detect_faces.service file in ~/aiy-maker-kit/examples. It's designed to run the detect_faces.py example when the device boots up, but it's disabled by default.

[Unit]
Description=Face Detection example

[Service]
Environment=DISPLAY=:0
Type=simple
Restart=on-failure
User=pi
ExecStart=/usr/bin/python3 /home/pi/aiy-maker-kit/examples/detect_faces.py

[Install]
WantedBy=multi-user.target
Copy

To enable this service so it runs the face detection example at bootup, run the following commands in the Raspberry Pi terminal:

  1. Create a symbolic link (a "symlink") so the .service file can be found in /lib/systemd/system/ (do not move the original file):
sudo ln -s ~/aiy-maker-kit/examples/detect_faces.service /lib/systemd/system
Copy
  1. Reload the service files so the system knows about this new one:
sudo systemctl daemon-reload
Copy
  1. Now enable this service so it starts on bootup:
sudo systemctl enable detect_faces.service
Copy

All set! You can try manually running the service with this command:

sudo service detect_faces start
Copy

Notice that you don't see any terminal output from the Python script, because you didn't actually execute the script from the terminal—it was executed by the service manager. But you can still press Q to quit the example when the camera window is in focus.

Or you can reboot the system to see it automatically start at bootup:

sudo reboot
Copy

As soon as the desktop reappears, you should see the camera window and a bounding box around your face.

Customize the service config

To create a service configuration for your own project, just copy the above file using unique file name (the name must end with .service) and save it somewhere in your Home directory (such as /home/pi/projects/). Then edit the Description and change change ExecStart so it executes your program's Python script.

You must then repeat the steps above to create symlink to your service file, reload systemctl, and enable the service.

The .service file accepts a long list of possible configuration options, which you can read about in the .service config manual.

Note: The above example includes the line Environment=DISPLAY=:0 because the face detection demo requires a display to show the camera images. So if your project doesn't require a display, you can remove this line. However, if you are still fetching camera images from get_frames(), then you must also pass the function argument display=False so it does not attempt to show images in a deskotp window.

Other tips

To manually stop the service once it's running, use this command:

sudo service detect_faces stop
Copy

If you want to disable the service so it no longer runs at bootup, use this command:

sudo systemctl disable detect_faces.service
Copy

You can also check the status of your service with this command:

sudo service detect_faces status
Copy

Models guide

The projects above use some trained models and even train a couple new ones, but there are more models you can try and the opportunities to train your own are endless. This section introduces just a few places where you can start exploring other models.

Trained models

The following models are fully trained and compiled for acceleration on the Coral USB Accelerator.

Each of the following links take you to a page at www.coral.ai where you'll find the model files. On that page, there's a table that describes each available model and the last column includes the download links. Make sure you download the Edge TPU model.

Image classification

Models that recognize the main subject in an image, and are compatible with the Classifier API.

Most models are trained with the same dataset of 1,000 objects, but there are three models trained with the iNaturalist dataset to recognize different types of birds, insects, or plants.

Object detection

Models that identify the location and label of multiple objects, and are compatible with the Detector API.

These are trained with a dataset of 90 objects (except for the one model trained to recognize faces—the same model used above). Notice that although some models are more accurate (the mAP score is higher), they are also slower (the latency is higher).

Pose detection

Models that identify the location of several points on the human body, but only the MoveNet models are compatible with the PoseDetector API.

The PoseNet models detect the same 17 keypoints but they output a different format, so although they do work with the Maker Kit, you must use different code as demonstrated in this PoseNet example. One of the only reasons you might want PoseNet instead is to detect the pose of multiple people.

Semantic segmentation

Models that can classify individual pixels as part of different objects, allowing you to know the more precise shape of an object, compared to object detection (which provides just a bounding-box area).

We do not offer an API for this type of model in aiymakerkit because the applications for such models are more complicated. To try it out, you can use this code with the PyCoral API.

Custom models

Nowadays, crating a model for common machine vision tasks such as image classification and object detection does not require that you actually design the neural network yourself. Instead, you can (and should) use one of many existing network designs that are proven effective, and instead focus on the training effort (gathering your training dataset and fine-tuning the training script).

Training a neural network model from scratch means you start with a network that has no learned weights—it's just a blank neural network. Depending on the model, training a model from scratch can take a very long time. So another shortcut often used (in addition to using an existing network design) is re-using the weights somebody else created when training the model, and then "fine tuning" the weights with a much shorter version of training.

This simplified training process is often called "transfer learning" or just "retraining," and it's the technique we recommend you use with the following training scripts.

All the training scripts below run on Google Colab, which we introduced in the pose classification project. Each script includes its own training dataset so you can immediately learn how to retrain a model, but you can then replace that dataset with your own. To learn more about Google Colab, watch this video.

Did you know?
Every training dataset includes sample input data (such as images) along with the corresponding "ground truth" results that we want to receive from the model's output (the label for an object in an image). A training program feeds the sample data to the model and the model performs a prediction (also called an "inference") for each input. Then, by calculating how close the prediction is to the ground truth result, the training program updates some numbers in the model so it can provide a better result next time. This repeats over and over and that's how machine learning models actually learn.
Note: If you're an experienced ML engineer, you should instead read TensorFlow models on the Edge TPU.

API reference

The following Python APIs are provided by the aiymakerkit module, which is used in the above tutorials.

We designed these APIs to simplify your code for the most common TensorFlow Lite inferencing tasks. But by no means does this represent everything you can do with ML on the Raspberry Pi and Coral USB Accelerator.

To learn more, check out the source code.

Image classification

class aiymakerkit.vision.Classifier(model)

Performs inferencing with an image classification model.

Parameters

model (str) – Path to a .tflite file (compiled for the Edge TPU).

get_classes(frame, top_k=1, threshold=0.0)

Gets classification results as a list of ordered classes.

Parameters
  • frame – The bitmap image to pass through the model.
  • top_k (int) – The number of top results to return.
  • threshold (float) – The minimum confidence score for returned results.
Returns

A list of Class objects representing the classification results, ordered by scores.

Object detection

class aiymakerkit.vision.Detector(model)

Performs inferencing with an object detection model.

Parameters

model (str) – Path to a .tflite file (compiled for the Edge TPU). Must be an SSD model.

get_objects(frame, threshold=0.01)

Gets a list of objects detected in the given image frame.

Parameters
  • frame – The bitmap image to pass through the model.
  • threshold (float) – The minimum confidence score for returned results.
Returns

A list of Object objects, each of which contains a detected object’s id, score, and bounding box as BBox.

Pose detection

class aiymakerkit.vision.PoseDetector(model)

Performs inferencing with a pose detection model such as MoveNet.

Parameters

model (str) – Path to a .tflite file (compiled for the Edge TPU).

get_pose(frame)

Gets the keypoint pose data for one person.

Parameters

frame – The bitmap image to pass through the model.

Returns

The COCO-style keypoint results, reshaped to [17, 3], in which each keypoint has [y, x, score].

aiymakerkit.vision.get_keypoint_types(frame, keypoints, threshold=0.01)

Converts keypoint data into dictionary with values scaled for the image size.

Parameters
  • frame – The original image used for pose detection.
  • keypoints – A COCO-style keypoints tensor in shape [17, 3], such as returned by PoseDetector.get_pose().
  • threshold (float) – The minimum confidence score for returned results.
Returns

A dictionary with an item for every body keypoint detected above the given threshold, wherein each key is the KeypointType and the value is a tuple for its (x,y) location.

class aiymakerkit.vision.KeypointType(value)

Pose keypoints in COCO-style format.

LEFT_ANKLE = 15
LEFT_EAR = 3
LEFT_ELBOW = 7
LEFT_EYE = 1
LEFT_HIP = 11
LEFT_KNEE = 13
LEFT_SHOULDER = 5
LEFT_WRIST = 9
NOSE = 0
RIGHT_ANKLE = 16
RIGHT_EAR = 4
RIGHT_ELBOW = 8
RIGHT_EYE = 2
RIGHT_HIP = 12
RIGHT_KNEE = 14
RIGHT_SHOULDER = 6
RIGHT_WRIST = 10

Pose classification

class aiymakerkit.vision.PoseClassifier(model)

Performs pose classification with a model from g.co/coral/train-poses.

Parameters

model (str) – Path to a .tflite file.

get_class(keypoints, threshold=0.01)

Gets the top pose classification result.

Parameters
  • keypoints – The COCO-style pose keypoints, as output from a pose detection model.
  • threshold (float) – The minimum confidence score for the returned classification.
Returns

The class id for the top result.

Camera & drawing

aiymakerkit.vision.get_frames(title='Camera', size=(640, 480), handle_key=None, capture_device_index=0, mirror=True, display=True, return_key=False)

Gets a stream of image frames from the camera.

Parameters
  • title (str) – A title for the display window.
  • size (tuple) – The image resolution for all frames, as an int tuple (x,y).
  • handle_key – A callback function that accepts arguments (key, frame) for a key event and the image frame from the moment the key was pressed. This has no effect if display is False.
  • capture_device_index (int) – The Linux device ID for the camera.
  • mirror (bool) – Whether to flip the images horizontally (set True for a selfie view).
  • display (bool) – Whether to show the camera images in a desktop window (set False if you don’t use a desktop).
  • return_key (bool) – Whether to also return any key presses. If True, the function returns a tuple with (frame, key) instead of just the frame.
Returns

An iterator that yields each image frame from the default camera. Or a tuple if return_key is True.

aiymakerkit.vision.save_frame(filename, frame)

Saves an image to a specified location.

Parameters
  • filename (str) – The path where you’d like to save the image.
  • frame – The bitmap image to save.
aiymakerkit.vision.draw_classes(frame, classes, labels, color=(86, 104, 237))

Draws image classification names on the display image.

Parameters
  • frame – The bitmap image to draw upon.
  • classes – A list of Class objects representing the classified objects.
  • labels (str) – The labels file corresponding to model used for image classification.
  • color (tuple) – The BGR color (int,int,int) to use for the text.
aiymakerkit.vision.draw_objects(frame, objs, labels=None, color=(86, 104, 237), thickness=5)

Draws bounding boxes for detected objects on the display image.

Parameters
  • frame – The bitmap image to draw upon.
  • objs – A list of Object objects for which you want to draw bounding boxes on the frame.
  • labels (str) – The labels file corresponding to the model used for object detection.
  • color (tuple) – The BGR color (int,int,int) to use for the bounding box.
  • thickness (int) – The bounding box pixel thickness.
aiymakerkit.vision.draw_pose(frame, keypoints, threshold=0.2, color=(86, 104, 237), circle_radius=5, line_thickness=2)

Draws the pose skeleton on the image sent to the display and returns structured keypoints.

Parameters
  • frame – The bitmap image to draw upon.
  • keypoints – A COCO-style pose keypoints tensor in shape [17, 3], such as returned by PoseDetector.get_pose().
  • threshold (float) – The minimum confidence score for returned keypoint data.
  • color (tuple) – The BGR color (int,int,int) to use for the bounding box.
  • circle_radius (int) – The radius size of each keypoint dot.
  • line_thickness (int) – The pixel thickness for lines connecting the keypoint dots.
Returns

A dictionary with an item for every body keypoint that is detected above the given threshold, wherein each key is the KeypointType and the value is at tuple for its (x,y) location. (Exactly the same return as get_keypoint_types().)

aiymakerkit.vision.draw_label(frame, label, color=(86, 104, 237))

Draws a text label on the image sent to the display.

Parameters
  • frame – The bitmap image to draw upon.
  • label (str) – The string to write.
  • color (tuple) – The BGR color (int,int,int) for the text.
aiymakerkit.vision.draw_circle(frame, point, radius, color=(86, 104, 237), thickness=5)

Draws a circle onto the image sent to the display.

Parameters
  • frame – The bitmap image to draw upon.
  • point (tuple) – An (x,y) tuple specifying the circle center.
  • radius (int) – The radius size of the circle.
  • color (tuple) – The BGR color (int,int,int) to use.
  • thickness (int) – The circle’s pixel thickness. Set to -1 to fill the circle.
aiymakerkit.vision.draw_rect(frame, bbox, color=(255, 0, 0), thickness=5)

Draws a rectangle onto the image sent to the display.

Parameters
  • frame – The bitmap image to draw upon.
  • bbox – A BBox object.
  • color (tuple) – The BGR color (int,int,int) to use.
  • thickness (int) – The box pixel thickness. Set to -1 to fill the box.

Audio classification

aiymakerkit.audio.classify_audio(model, callback, labels_file=None, inference_overlap_ratio=0.1, buffer_size_secs=2.0, buffer_write_size_secs=0.1, audio_device_index=None)

Continuously classifies audio samples from the microphone, yielding results to your own callback function.

Your callback function receives the top classification result for every inference performed. Although the audio sample size is fixed based on the model input size, you can adjust the rate of inference with inference_overlap_ratio. A larger overlap means the model runs inference more frequently but with larger amounts of sample data shared between inferences, which can result in duplicate results.

Parameters
  • model (str) – Path to a .tflite file.
  • callback – A function that takes two arguments (in order): a string for the classification label, and a float for the prediction score. The function must return a boolean: True to continue running inference, or False to stop.
  • labels_file (str) – Path to a labels file (required only if the model does not include metadata labels). If provided, this overrides the labels file provided in the model metadata.
  • inference_overlap_ratio (float) – The amount of audio that should overlap between each sample used for inference. May be 0.0 up to (but not including) 1.0. For example, if set to 0.5 and the model takes a one-second sample as input, the model will run an inference every half second, or if set to 0, then there is no overlap and it will run once each second.
  • buffer_size_secs (float) – The length of audio to hold in the audio buffer.
  • buffer_write_size_secs (float) – The length of audio to capture into the buffer with each sampling from the microphone.
  • audio_device_index (int) – The audio input device index to use.
class aiymakerkit.audio.AudioClassifier(**kwargs)

Performs classifications with a speech classification model.

This is intended for situations where you want to write a loop in your code that fetches new classification results in each iteration (by calling next()). If you instead want to receive a callback each time a new classification is detected, instead use classify_audio().

Parameters
  • model (str) – Path to a .tflite file.
  • labels_file (str) – Path to a labels file (required only if the model does not include metadata labels). If provided, this overrides the labels file provided in the model metadata.
  • inference_overlap_ratio (float) – The amount of audio that should overlap between each sample used for inference. May be 0.0 up to (but not including) 1.0. For example, if set to 0.5 and the model takes a one-second sample as input, the model will run an inference every half second, or if set to 0, it will run once each second.
  • buffer_size_secs (float) – The length of audio to hold in the audio buffer.
  • buffer_write_size_secs (float) – The length of audio to capture into the buffer with each sampling from the microphone.
  • audio_device_index (int) – The audio input device index to use.
next(block=True)

Returns a single speech classification.

Each time you call this, it pulls from a queue of recent classifications. So even if there are many classifications in a short period of time, this always returns them in the order received.

Parameters

block (bool) – Whether this function should block until the next classification arrives (if there are no queued classifications). If False, it always returns immediately and returns None if the classification queue is empty.

Utilities

aiymakerkit.utils.read_labels_from_metadata(model)

Read labels from the model file metadata.

Parameters

model (str) – Path to the .tflite file.

Returns

A dictionary of (int, string), mapping label ids to text labels.

More information

Learn more

To learn more about some of the tools and topics covered above, check out these other resources:

  • Linux basics: This is a great resource if you're new to Raspberry Pi OS (which is a Linux operating system). It includes lots of information and tips about using the terminal (the command line) and many other features of a Linux OS.

  • Python basics: This is a good reference for basic Python programming syntax. If you want a comprehensive introduction to Python, there are lots of online tutorials you can find just by searching the internet for "python for beginners."

  • Machine learning basics with TensorFlow: You can find a plethora of educational content online about ML, but this page offers some specific recommendations about learning ML from the TensorFlow team.

Or maybe you're looking for the Maker Kit source code:

Get help

If you're having trouble with any part of the AIY Maker Kit code or documentation, try these resources:

Upgrade an existing Raspberry Pi system

For experienced Raspberry Pi users only!

If you have a fully-functional Raspberry Pi system and you want to upgrade it to accelerate TensorFlow Lite models with the Coral USB Accelerator, then these instructions are for you.

Beware: We cannot guarantee this will work for you because there is an infinite number system configurations you might have that we have not tested. If you follow these instructions and encounter errors you're unable resolve, we cannot offer support and you should probably follow the instructions to flash the Maker Kit system image.

Before you start, be sure you have all the required hardware:

  • Raspberry Pi 3 or 4 running Raspberry Pi OS with desktop (Buster only)
  • Raspberry Pi Camera
  • Coral USB Accelerator
Note: We currently do not support Raspberry Pi OS Bullseye, due to breaking changes in the camera framework. We recommend using the Buster release and it must have a desktop.

Here's how you can upgrade your system with the Maker Kit software:

  1. Before you power the board, connect the Pi Camera Module. Leave the USB Accelerator disconnected until you finish the setup script.

  2. Power the board and connect to your Raspberry Pi terminal. You must have access to the Raspberry Pi OS desktop becuase our demos depend on the desktop to show the camera video.

  3. Open the Terminal on your Raspberry Pi. Navigate to a path where you want to save the Maker Kit API repository and run this command:

    bash <(curl https://raw.githubusercontent.com/google-coral/aiy-maker-kit-tools/setup.sh)
    
    Copy

That's it! Now go checkout the Maker Kit project tutorials!

Project complete!

You did it! Whether this was your first hackable project or you’re a seasoned maker, we hope this project has sparked new ideas for you. Keep tinkering and remixing this project!