Featured SRI PhD Project

Deep Reinforcement Learning-based Industrial Robotic Manipulations

In this research project, the basic research question is how to develop smart, intelligent, and self-learning algorithms for industrial robotic manipulations such as Pick-and-Place, which can produce effective and enhanced results. The aim of this research study is to build an efficient self-learning framework for industrial robotic manipulations in order to maximize the production quality and quantity of the various industries.

Main Objectives

The main objectives of this project after extensive literature review can be listed as follows:


Development and dissemination of results of a reinforcement learning-based algorithm to learn tasks such as pick-and-place in a non-visual environment.


Development and dissemination of results of deep reinforcement learning-based algorithm to perform and learn manipulation tasks in a vision-based environment.


Development and dissemination of deep reinforcement learning-based algorithm to learn multiple different manipulations together in a vision-based environment.

The training and testing beds are designed in the V-REP simulator developed by Toshiba R&D. V-REP have physics engines such as Bullet and ODE providing complete real-time experiences. For motion planning, Open Motion Planning Library (OMPL) is deployed to gain real-time inverse kinematics calculations in order to achieve dynamic motion planning of robotic arms. In the non-visual approach, off-policy and on-policy temporal difference algorithms Q-learning and SARSA agents were designed to make our Jaco robotic arm (6 degree-of-freedom) to learn pick and place different shape objects at three different alignments (left, center, right) and speeds of the belt (slow, medium, fast) with the help of proximity sensors.

For the vision-based approach, deep Q-networks (DQN) were utilized to make the agent UR5 robotic arm (6 degree-of-freedom) learn the pick-and-place task of different regular and irregular 3D objects with the help of orthogonal and perspective vision sensors. Pixelwise parameterization technique was used in order to generate action-value maps. RGB-D heightmaps created from the RGB-D data transmitted from vision sensors were given as states to the designed DQN. Prehensile and Non-prehensile robotic manipulations were learned together by updating the testbed and increasing the number of networks in the DQN.