Pose and object detection for visual media – applications in E-commerce and Surveillance

Pose and object detection for visual media – applications in E-commerce and Surveillance

The exciting part of using Artificial Intelligence is that it has the potential to detect the slightest changes in behavior through algorithms and machine learning. Machine learning processes and algorithms of open pose and object detection techniques support to keep a tab and analyze the visuals to detect the movement of vehicles, humans and a wide array of other objects. In e-commerce, open pose and object detection applications can be used for image classification, augmented reality and content filtering to enhance the user’s experience. Artificial intelligence is empowering not only the e-commerce industry but various others including surveillance and security markets. By using machine learning, open pose, and object detection techniques, a threat can be detected even before it happens.

Applications of open pose and object detection 

Poses convey intent and are an inherent part of the visual communication process. There can be mechanical intent, for example, in an exercise video, or an emotional intent as in a model standing with a handbag conveying confidence. Classification of poses and their implied messaging is important in a variety of applications covering Theatre, E-commerce, Fashion, Surveillance, and Gaming.

Online retail plays an increasingly dominant role in the fashion and fashion accessories industry. Visual images are the key asset that drives sales on online marketplaces. A deep understanding of the visual messages emanating from images and videos of products is key to the effective marketing of products on these platforms.

Security and surveillance are important aspects of any public space and helps to keep us safe from multiple threats. Capturing and analyzing visual feeds is a non-invasive method of keeping threats at bay. Identification of intent of subjects and object detection play a key role in automating the analysis of large quantum of data that is created by cameras that cover public spaces.

Thus, we can see poses play a vital role in visual content media. The trend of adoption of automation through Artificial Intelligence and the higher scale of use of visuals by businesses make an open pose and object detection techniques even more significant. Open pose technique helps to detect each action and gestures, an important input data for making the machines learn that particular behavior to ultimately embed it into the Artificial Intelligence system of the organization.

Pose and object detection

What is a body pose?

A body pose is a set of gestures displayed by body parts like hands, neck and legs. Each set of gestures can be unique, depending on the body type and the effort put in by the person. Further, these gestures send out various human-understandable signals and messages that as a set is called body language.

A gesture is a way various body parts of a human are oriented or moved. The angle in which the legs are oriented or the way the hand’s move are all part of gestures. The set of these orientations, movements and even their speed sends out various types of signals to the viewer or the observer. These are all called gestures.

Whenever a pose is made, it is a unique combination of various gestures. Poses can be used for many purposes:

  • Highlight a particular aspect of one’s body
  • Highlight a product or an object that one is holding
  • Create drama in a scene: fighting, assault, sexual advance, dance or any other action that helps create a context around an environment or a product or an actor
  • Demonstrate a particular familiar movement or operate equipment
  • Define body movements in combat or workout training videos

What are the types of body poses?

Body poses can be classified into Static and Dynamic poses. A time series of Static poses combine to form a Dynamic pose. For example, to perform a dance step, there can be multiple movements or gestures which need to be executed individually and in the correct sequence to form the Dynamic pose. As another example, performing an exercise involves a set of movements that are important to target a particular muscle group. This set of movements can only be captured dynamically as in a video or in animation to be the correct representation of exercise. This set of messages can be targeted to train someone, to entertain someone or send out positive or negative signals to the viewer of these videos.

Static poses, on the other hand, do not have a time dependency and are an aggregation of various complex or simple gestures to send out a particular message to the viewer. A person holding an object in a particular way or pointing to an object all convey messages. For example, a person doing Namaskar or particular yoga poses are examples where the subject stays static and yet conveys signals and messages.

To understand more about Pose and object detection and how it is done, Join us for a free webinar on 27 November 2019.
Register Now! to join the FREE WEBINAR

What is Object detection and recognition?

A scene generally comprises of objects placed at different positions and at varying angles. The role of object detection is to identify the location and the angle of placement of objects using Artificial Intelligence methods. Once an object is detected a pre-trained classifier is used to recognize the object and, if possible, its parameters are estimated. For example, a blue bottle, a glass vase, a notebook, a car, a fan, etc. Parameters of any detected objects whether in a static image or moving in a video can be used to derive a number of inferences about the actions of subjects in an image or a video.

Pose and object detection

How can body pose be combined with object detection and recognition?

For proper inference of subject actions in an image or a scene, it is important to understand the interaction of the subject with object(s). Combined information about the pose and objects interacted with and the manner of interaction conveys a lot of information about the intent of the subject, consciously and subconsciously. The dynamics of any visual representation can be captured by the subject and its poses, all present objects and interactions between the subjects and objects. To derive proper inference from any scene, all these dynamics are to be carefully (mathematically) fused and passed on to an Artificial Intelligence model, which produces the value of the inferences and makes this information useful for further classification.

How can technology help in this combination? 

With the advent of  Deep Neural Networks and specifically, Convolutional Neural Networks (CNN), which take data and start analyzing it in a better and more optimized way, these techniques can be leveraged and utilized to perform some of the actions which were not previously possible by using pure image/video processing algorithms. We can now solve the most complex issues using Artificial Intelligence methods. These improvements through the involvement of Artificial Intelligence have led to the adoption of open pose detection techniques and deriving human intent from the classification. Object detection is also a well-solved problem and made widely available through Artificial Intelligence technology.

Artificial Intelligence for Pose and object detection

Artificial Intelligence for Pose and object detection

What is Dresma doing in this area?

Dresma’s visual content platform uses open pose and object detection technologies to help classify images to make the post-processing pipeline more efficient and help tag images to make Artificial Intelligence-based Digital Asset Management systems more relevant.

In wedding albums where a photographer can take upwards of two to three thousand images, open pose and object detection techniques help quickly identify and tag images based on subject identification, subject action and also for automated scene detection further supporting machine learning to build the Artificial Intelligence system.

Pose detection technologies also enable efficient automation of post-processing where model poses can be detected and appropriate image enhancement techniques applied. It simultaneously keeps on adding to the machine learning for empowering the Artificial Intelligence of the business. The algorithms also enable more efficient allocation of resources in the cases where manual editing is required.

A further area of investment for Dresma is to enable our online marketplace customers to be able to quickly understand the effectiveness of visual content on their marketplace based on automatic classification of images and their impact on customer purchasing intent.  

To know more on how Dresma is planning to add to technology growth…Read: Evolution of image processing: leading in the Age of Disruption

What is the role of poses in security and law enforcement?

With the widespread adoption of Artificial Intelligence, open pose and object detection technologies play a significant role in interpreting actions and intents of subjects in live video feeds. A number of buildings and even some cities have installed CCTV cameras for real-time surveillance to assist security and law enforcement agencies. It is not physically efficient or economically feasible to manually monitor all the video feeds being generated. Human pose detection together with object detection technologies can help accurately capture actions of a person trying to shoplift or detect an old person falling on a pavement and can alert the relevant agencies to take instant action.  

The growing need for protection and monitoring has led to rapid developments in the surveillance and security market. The latest and innovative technology is being launched in the market to cater to the growing needs at all levels. The adoption of a machine learning method for surveillance and security market is Artificial Intelligence’s interesting process that has proved to be a transformative technology.

An interesting part of using Artificial Intelligence is that it has the potential to detect the threats before they happen. AI’s algorithm and machine learning can identify even the slightest changes in the behavior and can avert the potential danger.

Artificial Intelligence uses machine learning processes and algorithms to keep a tab and analyze the visuals like images, videos and enable the whole security and surveillance system to detect the movement of vehicles, humans and other objects.

A lot of developments are going on Artificial Intelligence- built applications that use machine learning for activity detection, motion capture & augmented reality, training robots and motion tracking for consoles. Artificial Intelligence-built applications make the machines learn, estimate and identify the activities, poses, and actions of the person using open pose and object detection techniques. Such machine learning-based Artificial Intelligence applications incorporating open pose detection and object detection techniques have a long way to go, bolstering the surveillance and security markets. These Artificial Intelligence-based machine learning applications would not only help in maintaining and enforcing the laws but would also raise the standards of security of our loved ones at home, school or wherever they go.


Advances in Artificial Intelligence techniques have made open pose and object detection an accessible technology. Real-world applications of these span multiple industries. In the areas of Fashion retailing and Surveillance, they help to better understand large volumes of complex visual data and enable solutions that bring real economic value to the industry.

To understand more about Pose and object detection and how it is done, Join us for a free webinar on January 8, 2020.
Register Now! to join the FREE WEBINAR


Add a comment