Introduction
Many cities in US and Europe are reopening cautiously now. People have been instructed to follow social distancing rules as they venture out. But do people follow them? It can be important for cities to assess this and take action accordingly. If most people follow them, then more places can be opened safely. However if there are many violations then it may be safer to close. This is exactly what happened at Miami Beach park. The park opened at the end of April but was closed within the week since too many people were flouting rules related to wearing masks and socially distancing themselves. The city detected this by using officers to monitor the park and issue warnings. But human monitoring may not be a practical solution.
How can we use AI and machine learning to detect if people are following social distancing rules? Most cities already have cameras installed at public places which can be used for this. In this blog, I show how we can use people tracking algorithms to monitor violations. I have also open sourced the code on my Github. See this model in action below.
At Deep Learning Analytics, we are very passionate about using data science and machine learning to solve problems. Please reach out to us if you are looking for data science help in fighting this crisis. Original full story published on our website here.
People tracking
Data
The first thing we need is video data to build and test our model on. I have used the publicly available MOT data set. MOT data set is the canonical data set for computer vision people tracking. Many state of the art algorithms are trained and tested on this data. This data set has many open sourced clips showing people movement under different camera angles. I have chosen a sub clip with stationary camera mounted at a height showing a town center in Germany. You can download this clip from here. See below one of the frames from this clip.
Person Tracking using Deep Sort
In computer vision, person tracking is the task of giving an ID to a person , detecting them in every frame they appear and carrying forward their ID. Once the person has left the frame, we do not reuse their ID. If a new person enters, they get initialized with a new ID.
Tracking tends to be a difficult task since people could look similar causing the model to switch IDs. People could get occluded behind another person or object and be assigned a new ID when they re-emerge. Deep learning techniques have significantly improved performance on multi object tracking benchmarks in the last few years. The current state of the art of multi object tracking is an accuracy of 62.0
You can read more about deep learning based people tracking in my blog here.
Why do we need to track people for our purpose of social distancing detection? The reason for this is that we want to find the unique number of people who are violating social distancing rules. In the absence of a tracker, if two people are walking close together then they will be counted as a violation in every frame, however if we use a tracker then we can count this as a single incidence of violation.
For this blog, I have used deep sort model for tracking. The code for this model has been made publicly availably by the authors on theit github. Deep sort model uses both the position of the person and their appearance to track. The position information is captured using a Kalman Filter that predicts the next likely position of the box while the appearance information is generated using a deep learning model that generates embeddings.
To run the code on this video, you need to pass the raw images and the detection file which has the positions of all the bounding boxes to the tracker. The tracker then uses this information to assign an ID to every person in every frame. The README on deep sort blog explains this in detail. See below the results of doing tracking on this clip. As you can see every person is assigned an ID and this ID is successfully carried forward into the next frame. The tracker also outputs a csv which has the details of the tracks. I have shared this file on my Github and we will use this for the next part of the code.
Detecting Social Distancing Violations
To detect social distancing violations, we take each track in the frame and measure its distance to every other track in the frame. Each track is basically a bounding box with an ID. So a bounding box can be compared to another bounding using the euclidean distance between them. Now we start our modeling. The code for that is shared below. This is the same code as in my Github.
The main steps that are run for every frame are:
- Compare the pixel distance between each track and every other track
- If distance < proximity threshold then, two people are too close to each other. So put safe =1 in the data frame for both the bounding boxes. The variable “safe” is later used for visualization
- We also want to count total violations for each ID. This is counted as other IDs they have come too close to. So anytime distance < proximity, we maintain a list of tracks that have come too close together in the dictionary track_violations
The code runs quite slow since it need to compare every track to every other track and do this over 600 frames. Many of these computations are repeated since it will separately measure distance between track 1 vs track2 and then track 2 vs track1. To save time, I store results from both the computations in a single pass. So when track1 and track 2 are compared the results are written in their respective rows in the dataframe. This cuts the run time in half. That’s it!
I found that a pixel distance of 70 was quite reasonable in detecting people who “seemed to be” walking too close to each other. The visualization module of the code highlights boxes in red when they come too close and also displays the count of violations for each box. Sample frame with results is shown below.
Deploying this practically
A few things need to be considered if you are looking to deploy this in production.
- The camera needs to be registered so we can map the pixel distance correctly to distance in the real world
- If there are a continuous array of cameras, then we may need to add person re-identification capability to help the tracker carry forward the ID and violation count between cameras. Person re-identification is an area that has seen a lot of research in the last few years
- The code here is quite light weight and could run on an embedded device like Jetson TX2 that is tied to a camera.
Conclusion
Tracking is an important problem in Computer Vision with tons of applications. One such application is detecting social distancing violations. This can help cities assess public health risk and re open safely.
I hope you give the code a try and experiment with what happens as you change the proximity criterion.
At Deep Learning Analytics, we are extremely passionate about using Machine Learning to solve real-world problems. We have helped many businesses deploy innovative AI-based solutions. Contact us through our website here or email us at info@deeplearninganalytics.org if you see an opportunity to collaborate.
References