Video Search | Pam Griffith

This is a multi-year (and as far as I know, still ongoing) project on developing a video search engine that can automatically classify videos and find specific events within videos. For IP reasons, I have left out the project name, and the names of the techniques used to classify the videos are obscured in the images below.

This project is a little unusual in that the interfaces I designed weren't intended to be a final, end-user-facing user interface, but rather were intended for the researchers working on developing the technology behind the project. It needed to allow the researchers to run their scripts, see how accurate the results were, and to demonstrate the system to the people providing the funding in a way that someone without the background in the technology would understand.

In this project I was responsible for all of the UI design, which involved working with the developers, testers, and external stakeholders to determine how the system worked at the moment and what each group needed to see in the interface, and then iterating on wireframes with those groups to make sure I got it right. Two of the interfaces I designed for the project are shown below.

I was also responsible for creating the interface in Bootstrap with a simple NodeJS backend that communicated with the analysis scripts; this is not shown in the images below, it looks like every Bootstrap site ever.

Initial analysis tools

The first part of the project I designed was an interface to run one or several of the analysis programs on a single video or group of videos, wait for analysis to complete, and then inspect the results of the analysis programs for each of the videos. The results consisted of tags associated with a particular start and end time in the video, and the inspection interface allows the researchers to view that particular part of the video to see if the tagged object or event is actually present. The wireframes for running an analysis on a single video are shown in the animation below.

This interface was highly successful, in that immediately after it was implemented the team was able to see that one of the analysis programs was producing very low-quality results, which hadn't been clear from the text file output they had been using previously, and it helped them improve the results. It also provided a concrete way to demonstrate improvements in the analysis programs over the course of the project.

Testing and training the machine learning algorithm

The animation above shows the types of things that the analysis programs recognized in a video, but the system also needed to be able to learn to recognize new things. For the analyses that used text or speech no further intervention was necessary, but those that recognized objects or actions needed some examples to generalize features from to find similar objects in other videos. One could imagine a high-traffic consumer search engine learning this automatically: someone searches for a term that the text or speech analysis algorithms know, then the videos they actually play are probably good results to count as positive examples to improve the next search. However, for this research tool we needed something more explicit and manual.

The animation below shows someone entering some seed terms to look for (there are currently multiple fields because the terms needed to be restricted to a known set, but this would not always be the case), then views examples the system selects for those terms and tells the system which are correct and incorrect. They can then iterate to find better examples based on the ones they just selected and continue correcting until they get consistently good results.

The results could then be saved to use against a test dataset so the researchers could see how well the system generalized form the examples to videos it had never seen before.

Though it wasn't completed while I was on the project, this process could create new tags that would be visible when inspecting what an analysis program things is in a video, as in the first animation, or used as terms to create even more complex tags (e.g. the second animation shows creating a "bike trick" tag, a new "bike rally" tag might include bike tricks, crowds on grandstands, bike races, etc.).

As above, this interface was also instrumental in demonstrating to the team what parts of the system were not producing good results so they could improve them, and it gave stakeholders an idea of how well the system might ultimately be able to perform.