To improve the efficiency and accuracy of robot programming and enable robots to quickly adapt to different tasks and unstructured environments, we propose a laser motion capture and AR based PbD robot system. As shown in Figs. 1, 2, the system consists of 4 parts:
-
(1)
A HTC VIVE laser motion captures system (two base stations). A wireless communication Handheld Teaching Device (HTD) equipped with teaching pen, touch buttons, excitation buttons, and mode buttons. A HTC VIVE Head Mounted Display (HMD) which is a middle station for HTC device communication, as shown in Fig. 9.
-
(2)
An AR Microsoft Holohens.
-
(3)
A virtual robot and a virtual HTD.
-
(4)
Industrial robot systems.

Schematic diagram of PbD for industrial robots using a HTD.

The algorithm flow of PbD for industrial robots using the HTD.
To accurately track the HTD, two base stations are mounted on either side of the table to scan the workspace, as shown in Fig. 2. A 5*5*5 m3 3D workspace can be scanned using two base stations. The workpiece is placed on the table before the system is used for programming. As the operator manipulates the HTD in the robot’s workspace, the virtual robot simultaneously adjusts its end-effector to align with the HTD in space in real-time. This synchronization occurs because the virtual robot’s movements are driven by the position and orientation data of the HTD captured by the motion capture system. Consequently, operators can use the HTD as a proxy for the robot’s end-effector, facilitating the quick and intuitive teaching of the end-effector’s position and orientation relative to the workpiece. In addition, the motion of the virtual robot is displayed on HoloLens glasses, providing the operator with a visual reference. Once the programming is complete, the physical robot replicates the operator’s demonstrated movements to perform the task.
In this paper, we present the system as a case study in welding programming and writing experiments. The following section details the principles of the motion capture and AR based system.
Principles
To accurately and efficiently translate demonstrators’ movements into executable programs for robots, the HTC VIVE devices are adopted to develop the PbD robot system. The system is fundamentally composed of two environmental modules and three functional units. As shown in Fig. 2, the two environment modules consist of a real physical environment and a unity virtual environment. The three functional units include a path planning unit, a registration unit, and a virtual robot drive unit. The overall framework and the interrelationships between each module are depicted in Fig. 2.
Physical environment
In the physical environment, the base station (lighthouse) emits infrared lasers to scan the robot’s workspace. The HTD, equipped with multiple laser receiving sensors, can be tracked in real time by the base station. Consequently, the position and orientation of the HTD in three-dimensional space can be tracked by the base station. The base station and the HTD communicate wirelessly, allowing their position and orientation data to be transmitted to a computer. The communication flow between these subsystems is shown in Fig. 9.
Unity virtual environment
The virtual environment is developed using the Unity 3D game development engine. The HTC VIVE tracking system integrates with Unity via the Steam VR interface plugin. In the virtual environment. The position and orientation of the virtual HTD are continuously updated in real time using the tracking data from the motion capture system in the physical environment. The position and orientation of the virtual HTD can be converted into signals to drive the virtual robot via the robot drive unit. The virtual environment is mapped to the real environment through Holohens so that it can assist the operator to program robots.
Registration unit
The registration unit is responsible for establishing the interrelationships between the subsystems to ensure seamless operation of the entire system. A detailed registration is provided in Sect. “Multi system registration”. Essentially, the registration unit maps the virtual environment to the physical environment in the spatial dimension. The proposed multi-system approach establishes the relationship between the motion capture system and the robot system. Vuforia registers the virtual environment to the physical environment, creating a mixed reality display in the HoloLens glasses.
Virtual robot drive unit
As analyzed above, the operator can manipulate the virtual HTD by manipulating the physical HTD. The endpoint of HTD is mapped to the tool centre point (TCP) of the robot’s end effector. In this way, the position and orientation of the robot end effector are obtained in real time. The position and orientation information of the robot end effector is converted into the robot’s joint angle using inverse kinematics. Through forward kinematics, these joint angles are used to drive the robot, ensuring that the robot’s end effector aligns with the HTD endpoint. The robot’s movement can be displayed in real time on the Holohens glasses.
Path planning unit
To facilitate efficient path planning, the HTD is designed to function as the end effector of the industrial robot. The detail design method of the HTD is show in following section. Operators can conveniently use the HTD to control virtual robots and plan paths with the assistance of augmented reality visual feedback. The proposed path planning algorithm converts the raw data into robot executable code.
HTD design
The proposed HTD design method allows for the design of HTDs with different shapes, as shown in Fig. 3. This section analyzes the method using Fig. 3a. Robots are widely used in advanced factories, but complex programming operations pose significant challenges in the field of industrial robotics. To address this, a tool mapping method is employed to ensure that the designed HTD can function as a robot end-effector, enabling rapid path teaching on the workpiece. As shown in Fig. 3, the handheld teaching device centre point (HTDCP) map the tool centre point (TCP) of the robot end effector. To create the HTD, 3D printing technology is utilized to convert an HTC VIVE wireless game controller into a teaching device. The HTD primarily consists of three components: a wireless communication game handle controller (Fig. 3(1)), a 3D printing connection part (Fig. 3(2)), and a teaching pen (Fig. 3(3)). The objective of using the HTD for path planning is to obtain the position and orientation of the HTDCP. To achieve precise measurement of the HTDCP’s position and orientation, two two key issues need to be addressed. First, selecting a stable and accurate motion capture system for tracking the HTD is essential. Second, designing a calibration method specifically suited for the HTD is necessary.

Robot end-effector and HTD.
HTD calibration
The HTC VIVE motion capture system has a high degree of positioning accuracy, reaching the millimeter level, and under good conditions, achieving sub-millimeter accuracy43. However, the reference coordinate system (\(O_H – XYZ\) in Fig. 4) of the HTD is located inside the device itself, which makes it impossible for the HTD to directly locate a point on the surface of the workpiece or to plan a path. To address this issue, we propose a four-point calibration method to calibrate the end position of the designed HTD. The HTC VIVE motion capture system which uses two base stations to track the HTD in real time is selected. From the perspective of tracking principle, this system falls under the active laser scan motion capture technology. The laser base station scans the robot workspace by emitting horizontal and vertical infrared lasers through a galvanometer. The HTD is equipped with 24 infrared receiving sensors placed at different positions on its head. This configuration ensures that the HTD can be scanned and accurately positioned, even when the operator holds it in various positions and postures. As shown in Fig. 4, the coordinate systems for the Unity virtual environment, base station, HTD, and HTDCP are represented by \(O_V – XYZ\), \(O_B – XYZ\), \(O_H – XYZ\), and \(O_P – XYZ\), respectively. When the motion capture system is first initialized, the transformation relationship \(_B^V T\) of the base station relative to the Unity virtual environment coordinate system can be calculated through the initialization plug-in. The real-time transformation relationship \(_H^B T\) of the HTD with relative to the laser base station is determined by the laser base station. Ultimately, the transformation relationship \(_H^V T\) of the HTD with relative to the Unity virtual environment coordinate system is provided by the motion capture system using Eq. (1).
$$_H^V T = _B^V T \cdot _H^B T.$$
(1)

Calibration principle of the HTD.
During the calibration process, the operator operates the HTD with the HTDCP located at a fixed point in space, as shown in Fig. 4. This operation is repeated four times with the HTD in different postures. It is noted that the poses of the HTD are random and the HTDCP must overlap with the fixed positions. The coordinate systems \(O_V – XYZ\), \(O_H – XYZ\), and \(O_P – XYZ\) are related as follows:
$$_P^V T_i = _H^V T_i \cdot _P^H T_i \, i \ge 4.$$
(2)
Here, \(_P^V T\) presents the transformation matrix of the HTDCP coordinate system with respect to the Unity virtual environment coordinate system. \(_P^H T\) is the transformation matrix of the HTDCP coordinate system with respect to the HTD coordinate system, and \(_H^V T\) is the transformation matrix of the HTD coordinate system with respect to Unity virtual environment coordinate system.
For each HTD calibration position, a \(_H^V T\) can be obtained from the motion capture system. Since the HTDCP is fixed to the HTD, \(_P^H T\) remains a constant matrix. Here,\(_P^H T_1 = _P^H T_2 = _P^H T_3 = _P^H T_4 = _P^H T\). To simplify the calibration process, solving for \(_P^V T\) problem can be converted to solving for \(_P^H T\) using Eq. (2). The transformation matrix in Eq. (2) can also be expressed in the following form:
$$_H^Vi T_i = \left[ \beginarray*20c _H^V R_i & ^VP_Hi \\ 0 & 1 \\ \endarray \right],$$
(3)
$$_P^H T_i = \left[ \beginarray*20c _P^H R_i & ^HP_Pi \\ 0 & 1 \\ \endarray \right],$$
(4)
$$_P^V T_i = \left[ \beginarray*20c _P^V R_i & ^VP_Pi \\ 0 & 1 \\ \endarray \right],$$
(5)
where P is a 3 × 1 translation vector, and R is a 3 × 3 rotation matrix. Combining Eqs. (3–5) and (2) yields the following relationship:
$$_H^V R_i ^HP_Pi + ^VP_Hi = ^VP_Pi$$
(6)
Since the HTDCP is located at a fixed point in space during the calibration process,\(^VP_P1 = ^VP_P2 = ^VP_P3 = ^VP_P4\).
$$_H^V R_1 ^HP_P + ^VP_H1 = ^VP_P ,$$
(7)
$$_H^V R_2 ^HP_P + ^VP_H2 = ^VP_P .$$
(8)
Equation (7) minus Eq. (8):
$$(_ H^V R_1 – _ H^V R_2 )^ HP_P = ^VP_ H2 – ^VP_ H1 .$$
(9)
Similarly:
$$(_ H^V R_2 – _ H^V R_3 )^ HP_P = ^VP_ H3 – ^VP_ H2 ,$$
(10)
$$(_ H^V R_3 – _ H^V R_4 )^ HP_P = ^VP_ H4 – ^VP_ H3 .$$
(11)
Combining Eqs. (9), (10), (11) produces the matrix Eq. (12) which can be used to solve for \(^HP_P\).
$$\left[ \beginarray*20c _H^V R_1 – _H^V R_2 \\ _H^V R_2 – _H^V R_3 \\ _H^V R_3 – _H^V R_4 \\ \endarray \right] \cdot ^HP_P = \left[ \beginarray*20c ^VP_H2 – ^VP_H1 \\ ^VP_H3 – ^VP_H2 \\ ^VP_H4 – ^VP_H3 \\ \endarray \right].$$
(12)
Equation (12) is a system of incompatible equations, which is solved by Gaussian elimination method:
$$^HP_P = \left[ \beginarray*20c ^VP_H2 – ^VP_H1 \\ ^VP_H3 – ^VP_H2 \\ ^VP_H4 – ^VP_H3 \\ \endarray \right]/\left[ \beginarray*20c _H^V R_1 – _H^V R_2 \\ _H^V R_2 – _H^V R_3 \\ _H^V R_3 – _H^V R_4 \\ \endarray \right].$$
(13)
Because the HTDCP is fixed to the HTD, \(O_P – XYZ\) can be set to the same rotation matrix as \(O_H – XYZ\).
$$_P^H T = \left[ \beginarray*20c \beginarray*20c 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \endarray & ^HP_P \\ \beginarray*20c 0 & 0 & 0 \\ \endarray & 1 \\ \endarray \right].$$
(14)
The transformation matrix \(_P^H T\) is the result of the calibration. By combining Eqs. (14), (2), the transformation matrix \(_P^V T\) is obtained, which contains the position and orientation information of the coordinate system \(O_P – XYZ\) relative to the coordinate system \(O_V – XYZ\). In other words, when an operator operates the HTD within the laser base station’s scan space, the motion capture system records the operator’s actions with the HTDCP as the reference point.
Multi system registration
The hardware system consists of the HTC VIVE system, the robot system and the Holohens system. To ensure the entire platform functions seamlessly, the relationship between the subsystems should be calculated. First, the workbench coordinate system acts as a bridge to establish the relationship between the HTC VIVE system and the robot system, as shown in Fig. 5.

Relationships between multiple subsystems.
Three fixed points are labeled on the workbench as shown in Fig. 6. The spatial position information of these three marker points (\(^Vp_A\),\(^Vp_B\), and \(^Vp_C\)) relative to the Unity virtual environment coordinate system is measured using the HTD. The unit vectors of the axes of the workbench coordinate system (\(O_W – XYZ\)) are calculated by Eqs. (15–21) based on the information from these three points. Subsequently, the transformation matrix (\(_W^V T\)) of the workbench coordinate system with respect to the Unity virtual environment coordinate system is calculated by Eq. (22).
$$^VX^\prime_W = ^Vp_B – ^Vp_A ,$$
(15)
$$^VY^\prime\prime_W = ^Vp_C – ^Vp_A ,$$
(16)
$$^VZ^\prime_W = ^VX^\prime_W \times ^VY^\prime\prime_W ,$$
(17)
$$^VY^\prime_W = ^VZ^\prime_W \times ^VX^\prime_W ,$$
(18)
$$^VX_W = \frac(^VX\prime_Wx ,^VX\prime_Wy ,^VX\prime_Wz )\sqrt (^VX\prime_Wx^2 + ^VX\prime_Wy^2 + ^VX\prime_Wz^2 ) ,$$
(19)
$$^VY_W = \frac(^VY\prime_Wx ,^VY\prime_Wy ,^VY\prime_Wz )\sqrt (^VY\prime_Wx^2 + ^VY\prime_Wy^2 + ^VY\prime_Wz^2 ) ,$$
(20)
$$^VZ_W = \frac(^VZ\prime_Wx ,^VZ\prime_Wy ,^VZ\prime_Wz )\sqrt (^VZ\prime_Wx^2 + ^VZ\prime_Wy^2 + ^VZ\prime_Wz^2 ) ,$$
(21)
$$_W^V T = \left[ \beginarray*20c \beginarray*20c ^VX_W^T & ^VY_W^T & ^VZ_W^T & ^Vp_A^T \\ \endarray \\ \beginarray*20c 0 & 0 & 0 & 1 \\ \endarray \\ \endarray \right].$$
(22)

Principles of registration of virtual environments.
Similarly, the spatial position information of the three marker points (\(^Rp_A\),\(^Rp_B\), and \(^Rp_C\)) relative to the robot system (\(O_R – XYZ\)) is measured using the TCP. By following the same computational procedure as described above, the transformation matrix (\(_W^R T\)) of the workbench coordinate system with respect to the robot is obtained.
$$_W^R T = \left[ \beginarray*20c \beginarray*20c ^RX_W^T & ^RY_W^T & ^RZ_W^T & ^Rp_A^T \\ \endarray \\ \beginarray*20c 0 & 0 & 0 & 1 \\ \endarray \\ \endarray \right],$$
(23)
$$_V^R T = _W^R T_W^V T^ – 1 ,$$
(24)
$$_P^R T = _V^R T_P^V T.$$
(25)
The designed HTD is used for path planning on the surface of the workpiece, using HTDCP as reference, as shown in Fig. 7b. The registration result, represented by Eq. (25), allows the physical robot to reproduce the HTD planning path as shown in Fig. 7d. In other words, the position and attitude of the HTDCP can be transformed from the motion capture system to the robot coordinate system using Eqs. (25), (28). The operator can then use the HTD to drive the virtual robot to program using Eq. (2), following the principle shown in Fig. 2. One challenge faced is that the operator cannot directly observe the virtual robot. To address this issue, augmented reality via the HoloLens is integrated into the system.

Principles of path planning using HTD.
Vuforia is a professional augmented reality application40. It can complete the registration of virtual environments into real physical environments through pictures or point clouds. The Vuforia sticker is attached directly to the robot table as shown Fig. 10. When the Holohens’ camera captures the Vuforia sticker, it computes the sticker relative to the virtual environment transformation matrix, and the virtual robot is registered accurately to the actual robot position.
link