Abstract: |
The monitoring of sport activities is a topic of increasing interest in several disciplines such as biology, medicine, statistics, engineering, and mathematics. The reason for such interest relies on the primary need to individualise the design of training activitities to maximise the improvements, and avoid over-training, which may lead to impaired health, and typically under-performance (Cardinale and Varley, 2017). Nowadays, the use of GPS-enabled tracking devices and heart rate monitors is common in several sporting disciplines, such as running, swimming, and cycling. In this context, data are collected as a sequence of N activities, where each activity is represented by a high frequency multivariate time series collecting P different variables, such as, GPS position, altitude, speed, and heart rate. Over the pastyears, the scientific interest has been catalysed toward the use of training data to monitor sport activities, and different solutions to many relevant problems have been proposed. Cardinale and Varley(2017) pose their attention on recent technological advancements on the use of wearable technologies to quantify and monitor training load, which is relevant in sport sciences since it allows to optimise the training programmes, avoiding the risks related to overtraining and overreaching. They distinguish between data regarding internal and external load. In the first case they refer to data related to more physiological aspects, such as the heart rate responses to stimulation imposed by training activities. In the second case they refer to data related to the work completed by the athletes, measured independently of their internal characteristics, such as speed and duration. They then focus on validity and reliability of the usage of such data, highlighting the importance of analysing them individually, for each athlete. Many approaches to predict the performance of athletes are based on the original work published by Calvert et al.(1976). Kolossa et al.(2017) propose the use of the so-called fitness-fatigue model for performance estimation, which requires as input variable the training load and provides as output the performance, being dependent on the initial performance, the training load, and two unobserved variables, called fitness and fatigue. Although this approach considers the training process as a sequence of activities, it does not exploit the potential of the ever growing amount of data collected by athletes. A valuable contribution in this field was provided by Frick and Kosmidis (2017), who developed an R package aiming to fill the gap between the routine collection of data from sport devices and their analyses using the R statistical software. Although the relevance of their contribution, the methods they propose do not account for the real-time usage of these data. We think that providing feedback information on the effects of training results to be more effective if the feedback comes while the activity is performed, in order to make well-time decisions on it. We propose a Bayesian matrix-variate clustering model useful for classifying online the trajectories of multivariate time series, accounting also for missing values and anomalies that characterize this kind of data. In this field, clustering trajectories allows to identify groups of activities which require similar effort, which is useful for understanding how one athlete is behaving during the activity. |