Abstract: |
With the advancement of VR technology, a number of systems have been developed to train cognitive and decision-making abilities in football from a first-person perspective. However, the voices of players giving instructions, which are important for situational judgment, have not been incorporated into these systems. These instruction voices are important elements in football, as the voices help teammates recognize the situation and make decisions. The reason why these voices have not been incorporated is that there are no systems to determine which players call out instruction voice in which situations. In this study, to enhance the practicality of the football situational judgment training system, we will construct a machine learning model that can predict the moment when instruction voice is called out and the player who calls out the instruction voice, based on match information such as ball's position, players' positions, and orientations.
In this study, the instruction voice "Look ahead" was targeted. We used match information of 10 games obtained from the RoboCup Soccer Simulation League. Since the match information of the RoboCup Soccer League contains no instruction voices, we developed a survey system for football experts to identify the situation in which the instruction voices were called out. The situations are characterized by the frame number, the player who called out the instruction voice, and the player who received the voice. Similar situations within ±5frames were also included as the situation when the instruction voices were called out. The inclusion criteria were that the direction of the ball's movement did not change, the total movement distance of all players was within a certain range, and the orientation of the player receiving the instruction voice remained constant.
To predict both the moment when the instruction voice was called out and the player who called out the instruction voice with a single machine learning model at the same time was difficult. Therefore, the proposed system divided the prediction into two machine learning models: one to determine the moment when the instruction voice was called out and another to identify the player who called out the instruction voice. To train these machine learning models, we tested the following machine learning algorithms: SVM, Logistic Regression, Random Forest, XGBoost, LightGBM, and AdaBoost.
For the machine learning model to determine the moment, the following features were adopted as inputs: ball position, player positions and orientation, and the weight of each player based on the distance between the ball and the player. The aforementioned features from the previous frame were also used. All positions and orientations were converted to the ball center coordinate system. As a result, LightGBM achieved the highest accuracy, with a recall rate of 0.55 and a precision rate of 0.71.
For the machine learning model to identify the player, the following features were adopted as inputs: ball position, player positions, and orientation. All positions and orientations were converted to the center coordinate system of 10 players from one team (excluding the goalkeeper). As a result, XGBoost achieved the highest accuracy, with a recall rate of 0.62 and a precision rate of 0.55.
These results revealed that the proposed method can predict instruction voice with a certain degree of accuracy. |