Quntum News

# Improved Reinforcement Learning with Quantum Computation for Robot Arm Control

Reinforcement learning has emerged as one of the most interesting methods for controlling robot arm behavior in unstructured environments. However, the challenge of learning diverse control functions has not been resolved enough for real-world application. This is primarily due to two learning paradigm issues: search strategies and slow learning speed. The application of quantum computation to reinforcement learning is expected to solve these problems.

Table of Contents

Robot Autonomy through Reinforcement Learning

Reinforcement Learning Challenges for Application to Robot Control

Quantum Computation for Machine Learning

Improved Reinforcement Learning with Quantum Computation

Robot arm control by quantum reinforcement learning

Conclusion.

**Robot Autonomy through Reinforcement Learning**
Autonomous robots are designed to deal with their environment on their own without human intervention. Autonomous robots are intelligent machines that make decisions based on information they perceive from their environment and perform complex movements and manipulation tasks within that environment. To enable robots to operate autonomously, over the past few years academia and industry have been searching for more software-based control solutions with low-cost sensors. The emphasis is on robust algorithms and software that have low calibration requirements along with operating environment requirements. Deep Reinforcement Learning (DRL), a combination of Deep Learning (DL) and Reinforcement Learning (RL), has emerged as a promising approach that allows robots to autonomously acquire complex behaviors while observing their environment with low-level sensors. It is expected to enable complex learning by physical robots in largely unstructured environments. Programmable logic controllers using inverse kinematics have the potential to acquire control strategies through learning and subsequently update them on a case-by-case basis, instead of adjusting the robot's behavior based on a predetermined set of commands. The use of learning-based technology to control robots is attractive. It allows robots to move into less structured environments, process unknown objects, and learn state representations suitable for multiple tasks. For example, as part of warehouse automation in areas such as textile and garment manufacturing and food production, the robot can replace human pickers to select items of various sizes and shapes.

**Reinforcement Learning Challenges for Application to Robot Control**
Compared to reinforcement learning, deep reinforcement learning can solve critical data dimensionality and scalability problems in tasks with sparse reward signals, such as robot manipulation and control tasks. However, despite recent improvements, the challenge of learning robust manipulation skills for robots by deep reinforcement learning has not been solved to the extent that it can be applied in the real world. The main cause is the well-known problem of deep reinforcement learning. The main causes are the well-known problems of deep reinforcement learning: sample efficiency, generalization, and computational resources in training learning algorithms for complex problems. By "sample efficiency," we mean the amount of data needed to construct the optimal strategy and accomplish the designed task. However, several problems in robotics hinder effective sample efficiency. For example, (1) agents cannot receive a training set provided unilaterally by the environment, but receive information determined by both the behavior by the agent and the dynamics of the environment. (2) Agents aim to maximize long-term rewards, but can only observe recent rewards. (3) There is no clear boundary between the training and testing periods. This is due to the so-called "trade-off between exploration and exploitation," where the time agents spend improving a measure often occurs at the expense of exploiting this measure. Generalization, on the other hand, refers to the ability to leverage previous knowledge from the source environment to achieve superior performance in the target environment, and its applicability to flexible, long-term autonomy. This is widely considered to be a necessary step in creating artificial intelligence that behaves like a human. It should also be noted that deep reinforcement learning is computationally intensive, given the large amount of data required to reach optimal results, and requires high-performance computers to train models and fix the learning process.
Further progress is needed to overcome these limitations. This is because both interacting with the environment to gather experience for reinforcement learning and gathering expert behavior are expensive. The computational resources and generalization capabilities of quantum computation (QC) are expected to far exceed those of classical computers, and these can be leveraged to accelerate and improve the learning process.

**Quantum Computation for Machine Learning**
Quantum computation is a technique that applies the laws of quantum mechanics to solve problems that are too complex for classical computers to solve. Error tolerant quantum devices may not yet be realized. However, NISQ, a near-future device with a limited number of qubits, coherence time, and fidelity of operation, is already available for a wide variety of problems. (NISQ is an acronym for Noisy Intermediate-Scale Quantum Computers.) One promising use is hybrid training of variational (or parameterized) quantum circuits (VQCs). That is, a parameterized quantum algorithm is optimized by classical optimization techniques as a function approximation, similar to a classical neural network. The predominant application in academia is to formalize the task of interest as a variational optimization task and find an approximate solution using a hybrid "quantum-classical" hardware setup.
Figure 1: Schematic of the variational quantum algorithm
By implementing a few subroutines in classical hardware, quantum resource requirements can be significantly reduced. In particular, requirements in the number of qubits, circuit depth, and coherence time can be greatly reduced. Thus, in hybrid algorithmic methods, the NISQ hardware focuses exclusively on the classically unruly part of the problem. Quantum machine learning (QML) typically involves training variational quantum circuits to analyze classical data. Quantum machine learning models may offer several advantages over classical models. They are in terms of memory consumption and sample complexity for classical data analysis. In addition, a comprehensive study has recently been published on the generalization performance of quantum machine learning after training on a limited number of training data points, showing that adequate generalization is guaranteed with a small number of training data. All of this seems promising in overcoming the problems of deep reinforcement learning for robot control mentioned earlier.

**Improved Reinforcement Learning with Quantum Computation**
Several research papers discuss the quantum advantages that can be gained by using quantum computers for reinforcement learning tasks. The advantage is that it speeds up the decision-making process of quantum agents learning within a classical environment. Technologies in the distant future will require fully quantum methods, such as the Grover search algorithm, leading to large-scale circuitry and error tolerant quantum computers, but these have yet to be developed. The best methods today utilize hybrid quantum/classical algorithms, the quantum part of which is implemented via smaller circuits and variational quantum circuit techniques. In deep reinforcement learning, deep neural networks are employed as powerful function approximators. Approximation usually occurs in the policy space (actor), the value space (critic), or both, resulting in the so-called "actor-critic" approach. Recently, variational quantum circuits have been proposed as function approximators in the reinforcement learning setting and their role has been analyzed. So far, no quantum advantage for this approach is guaranteed. However, several research papers and pre-review manuscripts have shown promising experimental results. These include: variational quantum circuit-based models can achieve at least the same performance as neural network-based function approximators, and the use of variational quantum circuits significantly reduces the complexity and convergence time of the required parameters, improving training stability and the expressive power of reinforcement learning models (inclusive). improvements, among other things (for a comprehensive review, see https://arxiv.org/pdf/2211.03464.pdfなどを参照してください).

**Robot arm control by quantum reinforcement learning**
We began with recent applications of variational quantum circuits in reinforcement learning problems and investigated the applicability of hybrid quantum/classical algorithms to the control task of a robotic arm. We also experimented and evaluated the advantages of applying variational quantum circuits to one of the state-of-the-art reinforcement learning methods for continuous control, soft-actor clitics (SAC), by means of digital simulations of quantum circuits. As a matter of fact, the task of robot arm manipulation requires continuous control. In fact, the task of manipulating a robotic arm requires continuous control, since accurate motion control requires the observation of continuous values obtained from sensors and actions. The robotic arm can be thought of as a series of links. It is moved by joints containing motors that change the position and direction of the links. In our experiment, we used a virtual two-dimensional, four-jointed robotic arm with the first joint attached and fixed. The arm can move its links by joints in a two-dimensional plane, and each link can be moved independently, clockwise and counterclockwise, up to a specified speed. The last joint is called the end-effector. They were created by adapting part of the OpenAI Gym environment called Acrobot and using Box2D technology.
Figure 2: Schematic of mechanical components for a four-joint robot arm.
The quantum soft-actor clitic algorithm is very similar to its classical counterpart. The only difference is that some neural network layers are replaced by variational quantum circuits.
Figure 3: The architecture of the quantum soft-actor clitic's hybrid quantum/classical actor component.
Google's TensorFlow quantum library was used as the development framework for quantum machine learning. This library works by simulating quantum circuits using the Google Cirq library for quantum circuits and distributing the computational load through TensorFlow's multi-threading capabilities. All classical components of this quantum-classical hybrid algorithm were implemented using the TensorFlow library for machine learning. The results show a clear quantum advantage in the number of trainable parameters. It should be noted that the classical soft-actor clitic algorithm with the same number of parameters as the equivalent quantum algorithm did not converge. Classical algorithms require 100 times the amount of parameters compared to quantum algorithms to resolve this environment with performance comparable to quantum agents.
Figure 4: Learning curves for classical and quantum/classical soft-actor critique architectures tested in a robotic arm environment.

**Conclusion.**
Through numerical simulations, we have shown that the actor-critic quantum strategy outperforms classical models of similar architectures in the assumed benchmark robot control task. This method has potential application areas in a variety of real-world scenarios. This research shows that quantum reinforcement learning can be leveraged for robot control and contribute to future advances in autonomous robotics. The results of the study have been submitted to the arXiv. They can be found at the following link ( https://arxiv.org/pdf/2212.11681.pdf).

(Link:https://www.nttdata.com/jp/ja/data-insight/2023/031601/)