I like doing workouts and different types of trainings, like crossfit, but when the training is too intense or too long, I notice that I often make mistakes while counting how many movements I make on each exercise, this might be either due to lack of concentration on movement counting task during training or subconscious overestimation of the number of moves performed. As a third year Computer Sciense BSc student I decided to solve this problem during my course work and created the Web App to count the number of moves performed during workout. In this article I would like to share my approach to this problem. You can find the full code of the app in the github repository.
In order to perform movement counting, you have to know if the body moves up or down on each frame. Usually, to perform such kind of task I would need to use some RNN architecture, because, obviously, you can’t detect the direction of movement using one frame only. Is he moving up or down on this photo?
But I didn’t have enough training data for making a robust RNN model, as I had to prepare and label the data myself. I tried looking in the direction of PoseNet models, to get coordinates of each body part on each frame.
This approach wasn’t beneficial either, due to several reasons:
- The model was performing well if I tried it in the same environment it was trained on (the same room, same video angle, same person), but to make a robust model just for one exercise I would still need a lot of training data.
- The FPS of PoseNet without using a GPU card was really low.
- On some frames the quality of the detected body parts was low.
All in all, I played a lot with different models, and they all gave a poor result in some way. This was until one day I learned about the Optical Flow algorithm and especially Dense Optical Flow implementation (the right part of the image below). In a nutshell, this algorithm tracks the movement of pixels along some number of consequent frames.
The optical flow can be either estimated using some mathematical models, which are implemented, for example, in OpenCV library, or it can be directly predicted using Deep Learning, which gives far better results in the complex video scenes. In my implementation I decided to stick to the Dense Optical Flow algorithm, which was implemented in python-opencv package.
Here is how one push-up can be color coded with Dense Optical Flow.
As you see, Dense Optical Flow encodes movement down as the green color and movement up as the purple color. Thereby, knowing the color coded representation of each frame, I could easily build a simple CNN network to perform multiclass classification of the frames. I just stacked some Conv + Pooling layers in PyTorch, which resulted in the following simple architecture.https://towardsdatascience.com/media/9621536f83702dcfb779b39b3b82b782
To train this model, I loaded and labeled by frame a few YouTube videos, I also prepared some push-up videos myself. Finally, I had a training set of the color coded images, which consisted of 252 moving down frames, 202 non push-up frames and 206 moving up frames. I also prepared a small validation set consisting of 140 frames with different movements. After running a training loop for 10 epochs I got a pretty impressive graph of LogLoss for my model.
Obviously, it wasn’t too hard for the model, to predict for these 3 classes, because it can be easily done just by looking at the color coded images by eye.
What was more important, is the fact, that the trained model was able to classify frames, not only for push-ups but for burpees, squats and pull-ups as well. In genereal, I guess this exact model can easily classify all movements with a high amplitude, that involve moving up and down.
Though, to classify some exercises like sit-ups, or some low amplitude dumbbell moves, it is better to collect a new training set and to retrain the current model.
To apply my model in real life I created a small Web App using Django, where I could create a new workout and try my model in the “battle” environment. Here is how it looks like.
In general, during training, I noticed an error around 2.5% for push-ups, squats and pull-ups. For burpee, the error was around 5%, due to the fact that the exercises involves more than one up-down movement. Here is how the model counts push ups during workout.
To conclude, this work was a great experience for me, as I had to make a lot of research and to test different hypotheses for the problem of movement counting during workout. My time tracker shows that right now I have spent around 75 hours on this app development, but who knows, maybe I will spend even more if I decide to continue the project and make it something bigger. Thank you for your read!