EyesOnIt supports two main detection modes: object detection in individual images and object detection in videos with monitoring and alerting. Both modes are available through the EyesOnIt configuration user interface (UI) and through the REST API. This article will cover the basic usage of object detection in individual images through the configuration UI. This way of using EyesOnIt lets you test EyesOnIt to determine if it will detect the objects that you care about. The other ways of using EyesOnIt are more appropriate for production use.
For reference, the image below shows the object detection UI.
Object detection in individual images through the configuration UI is the most basic scenario for using EyesOnIt, so this is a good time to introduce four key concepts. Those concepts are:
Text descriptions
Multiple object descriptions
Full image comparison
Confidence levels
Let’s look at each of these concepts in detail.
Text Descriptions
The EyesOnIt computer vision model is pre-trained to detect thousands of different types of objects. As a user, you can use text descriptions to tell EyesOnIt what you want it to detect. These descriptions should generally be short and should only include the key words describing the object. Here are some examples of object descriptions:
General descriptions: person, vehicle, animal, building
More specific descriptions: police officer, firefighter, USPS mail truck, blue sedan, tiger, bank, fallen tree, person blue shirt
Actions: person running, person walking, airplane flying
As these examples show, object descriptions can be general or specific, and they can include action words.
Multiple Object Descriptions
EyesOnIt needs two or more descriptions to achieve the best accuracy. There are two main scenarios for multiple descriptions.
Which object is present?
If you know that an image will include an object and you want to determine which object is present, include one description for each object that could be present.
Is the object present?
If you are trying to detect the presence or absence of one or more objects, include a background description with the object descriptions. The background description describes the scene without the objects you want to detect. The background description will help EyesOnIt to determine whether one of the objects is present, or if the image only contains the background.
Full Image Comparison
When EyesOnIt processes an image, it compares your object and background descriptions to your entire image. This works well when the objects you described are large or prominent in your image. When the objects are small, EyesOnIt may not detect them as easily. To help detect smaller objects, EyesOnIt supports tiling and masking. Tiling is dividing the image into multiple rows and columns.
EyesOnIt produces overlapping rows and columns and then provides other options to help you define tiles that will include your entire object. Masking is telling EyesOnIt which tiles to look at. When EyesOnIt looks at a tile, it treats that tile as one entire image. By creating tiles of the proper size, smaller objects take up a larger percentage of the tile and EyesOnIt is more successful at detecting them.
Confidence Levels
When you ask EyesOnIt to process your image, it responds with a confidence level for each of your object descriptions. The confidence level tells you how confident EyesOnIt is from a value of 0 to 100 that your object is present. As a side note, confidence levels are especially important when you use the video capabilities of EyesOnIt for monitoring and alerting. It is also important when you use the REST API to process a batch of images. A high confidence level suggests that you can accurately take action based on EyesOnIt’s detection.
In video mode, you may configure EyesOnIt to automatically send alerts based on a high confidence object detection. Through the REST API, you may sort a batch of images into different folders based on the objects in those images. In either case, a high confidence level allows you to act on EyesOnIt detections accurately.
Examples
Now that we’ve covered the key concepts, let’s look at some examples to see how these concepts apply.
Example 1
We’ll start with a simple case. Here’s the same image that we showed above:
In this image, a person is hiding in the trees, and we want to determine if EyesOnIt can accurately detect them. Our object description of “person” is pretty obvious for this case. Since we want EyesOnIt to determine if the person is present or not present, we add a background description. Again, “trees” is an obvious choice. The person is large enough in this image that we don’t need to use tiling. With these settings, EyesOnIt is 99% confident that the image contains a person.
Example 2
Let’s look at a similar but more difficult case. In this case, we’ll use a larger version of the same image as shown below:
The person is a much smaller part of this image, and EyesOnIt is only 43% confident that the image is more like “person” than “trees.” This is where tiling and masking are valuable. In a real situation, we wouldn’t know where the person would be in the image. For this example, we’ll assume that we are looking for people close to the ground. We can use the scroll bars to create tiles in rows and columns. While we could create a single row to capture the entire bottom of the image, we’re creating multiple columns as well, because it’s good to keep tiles close to square proportions.
We then create a mask using the tiles close to the ground, like this:
With these settings, EyesOnIt is again 99% confident that it found a person in the trees.
Example 3
Let’s switch gears and try something different. Let’s say you have a collection of images you want to categorize based on which object is in each image. With the EyesOnIt REST API, you could easily loop through your images, ask EyesOnIt to identify the object type, and then act on that detection by moving each image to a different folder. We’ll cover this use of the EyesOnIt REST API in a future blog post.
For now, we’ll show how EyesOnIt can perform accurate detection using the configuration UI. For our objects, we’ll use vehicles with overlapping attributes of body type and color. We’ll use the same set of object descriptions for each image to see if EyesOnIt can accurately detect the correct object.
For the first image, you can see that EyesOnIt accurately identified the vehicle as a blue sedan.
The next image is accurately identified as a green sedan:
The final image is correctly identified as a blue truck:
Hopefully this walkthrough of the EyesOnIt individual image configuration UI has helped you understand the key EyesOnIt concepts. These same concepts apply to other uses of EyesOnIt. By applying these concepts, you can use EyesOnIt to enhance safety, security, and efficiency with video monitoring and batch image processing. Future blog posts will cover those other uses of EyesOnIt.