Equivalence and agreement in validation studies: A practical methodological review
Main Article Content
Keywords
Methods, Validation Study, Data Accuracy, Evaluation Study, Data Analysis
Abstract
Accurate statistical analysis is essential in validation studies of instruments that quantify continuous variables against a reference standard. This article describes statistical approaches to evaluate equivalence between measurement instruments combining graphical methods and statistical tests. Its application is exemplified through a study that assessed the accuracy of a physical activity tracking wristband (Xiaomi Mi Band 4) for counting steps walked during different activities in patients with chronic respiratory diseases, and it was compared with a video-based reference method. Confidence intervals were used alongside predefined equivalence zones, TOST (two one-sided tests) procedures were applied, and both group-level and individual-level indicators of agreement were calculated, such as the mean error (ME), mean percentage error (MPE), mean absolute percentage error (MAPE), and root mean squared error (RMSE). In addition, some common errors were also discussed, such as the inappropriate use of scatter plots or correlations to assess accuracy. The article concludes that selecting appropriate statistical methods is a key aspect to ensure clinical and methodological validity in equivalence studies between measurement instruments that quantify continuous variables and a reference method.
References
1. Shei RJ, Holder IG, Oumsang AS, et al. Wearable activitytrackers-advancedtechnologyoradvanced marketing?Eur J Appl Physiol. 2022;122(9):1975-90.doi: 10.1007/s00421-022-04951-1
2. Dixon PM, Saint-Maurice PF, Kim Y, et al. A Primer on the Use of Equivalence Testing for Evaluating Measurement Agreement. Med Sci Sports Exerc. 2018;50(4):837-45. doi: 10.1249/MSS.0000000000001481
3. Giurgiu M, von Haaren-Mack B, Fiedler J, et al. The wearable landscape: Issues pertaining to the validation of the measurement of 24-h physical activity, sedentary, and sleep behavior assessment. J Sport Health Sci. 2024;14:101006. doi: 10.1016/j.jshs.2024.101006
4. Dell’Era S, Gimeno-Santos E, Chain NAF, et al. Exactitud del Xiaomi Mi Band 4 para contabilizar pasos en adultos con enfermedades respiratorias crónicas. Estudio de concordancia. Respirar. 2024;16(2):101-12. doi: 10.55720/respirar.16.2.1
5. Kim J, Kenyon J, Billingsley H, et al. Validity of the Actigraph-GT9X accelerometer for measuring steps and energy expenditures in heart failure patients. PLoS One. 2024;19(12):e0315575. doi: 10.1371/journal.pone.0315575
6. Hibbing PR, Pilla M, Birmingham L, et al. Evaluation of the Garmin Vivofit 4 for assessing sleep in youth experiencing sleep disturbances. Digit Health. 2024. doi: 10.1177/20552076241277150
7. Taffé P, Zuppinger C, Burger GM, et al. The Bland-Altman method should not be used when one of the two measurement methods has negligible measurement errors. PLoS One. 2022;17(12):e0278915. doi: 10.1371/journal.pone.0278915
8. Welk GJ, Bai Y, Lee JM, et al. Standardizing Analytic Methods and Reporting in Activity Monitor Validation Studies. Med Sci Sports Exerc. 2019;51(8):1767-80. doi: 10.1249/MSS.0000000000001966
9. Ialongo C. The logic of equivalence testing and its use in laboratory medicine. Biochem Med (Zagreb). 2017;27(1):5-13.doi: 10.11613/BM.2017.001
10. Mayorga-Vega D, Casado-Robles C, Guijarro-Romero S, et al. Criterion-Related Validity of Consumer-Wearable Activity Trackers for Estimating Steps in Primary School children under Controlled Conditions: Fit-PersonStudy. J Sports Sci Med. 2024;23(1):79-96. doi: 10.52082/jssm.2024.79
11. Casado-Robles C, Mayorga-Vega D, Guijarro-Romero S, et al. Validity of the Xiaomi Mi Band 2, 3, 4 and 5 Wristbands for Assessing Physical Activity in 12-to-18-Year-Old Adolescents under Unstructured Free-Living Conditions. Fit-Person Study. J Sports Sci Med. 2023;22(2):196-211. doi: 10.52082/jssm.2023.196
12. Hao Y, Ma XK, Zhu Z, et al. Validity of Wrist-Wearable Activity Devices for Estimating Physical Activity in Adolescents: Comparative Study. JMIR Mhealth Uhealth. 2021;9(1):e18320. doi: 10.2196/18320
13. Ummels D, Bijnens W, Aarts J, et al. The Validation of a Pocket Worn Activity Tracker for Step Count and Physical Behavior in Older Adults during Simulated Activities of Daily Living. Gerontol Geriatr Med. 2020;6:2333721420951732. doi: 10.1177/2333721420951732
14. Kwon S, Wan N, Burns RD, et al. The Validity of Motion Sense HRV in Estimating Sedentary Behavior and Physical Activity under Free-Living and Simulated Activity Settings. Sensors (Basel). 2021;21(4). doi: 10.3390/s21041411
15. Viciana J, Casado-Robles C, Guijarro-Romero S, et al. Are Wrist-Worn Activity Trackers and Mobile Applications Valid for Assessing Physical Activity in High School Students? Wearfit Study. J Sports Sci Med. 2022;21(3):356-75. doi: 10.3390/s21041411
16. Silva JC, Silva KF, Torres VB, et al. Reliability and validity of My Jump 2 app to measure the vertical jump in visually impaired five-a-side soccer athletes. Peer J. 2024;12:e18170. doi: 10.7717/peerj.18170
17. Matlary RED, Holme PA, Glosli H, et al. Comparisonof free-living physical activity measurements between ActiGraph GT3X-BT and Fitbit Charge 3 in young people with haemophilia. Haemophilia. 2022;28(6):e172-80. doi: 10.1111/hae.14624
18. Sullivan K, Metoyer CJ, Hornikel B, et al. Agreement Between A 2-Dimensional Digital Image-Based 3-Compartment Body Composition Model and Dual Energy X-Ray Absorptiometry for The Estimation of Relative Adiposity. J Clin Densitom. 2022;25(2):244-51. doi: 10.1016/j.jocd.2021.08.004
19. Majmudar MD, Chandra S, Yakkala K, et al. Smartphone camera based assessment of adiposity: a validation study. NPJ Digit Med. 2022;5(1):79. doi: 10.1038/s41746-022-00628-3
20. Shinozaki K, Yu PJ, Zhou Q, et al. An Automation System Equivalent to the Douglas Bag Technique Enables Continuous and Repeat Metabolic Measurements in Patients Undergoing Mechanical Ventilation. Clin Ther. 2022;44(11):1471-9. doi: 10.1016/j.clinthera.2022.09.004
21. Correa-Rojas J. Coeficiente de correlación intraclase: aplicaciones para estimar la estabilidad temporal de un instrumento de medida. Cienc Psicol. 2021;15(2):e1220. doi: 10.22235/cp.v15i2.2318
22. Nazaroff J, Mark B, Learned J, et al. Measurement of acetabular wall indices: comparison between CT and plain radiography. J Hip Preserv Surg. 2021;8(1):51-7. doi: 10.1093/jhps/hnab008
23. Villa G, Cerfoglio S, Bonfiglio A, et al. Validation of a Commercially Available IMU-Based System Against an Optoelectronic System for Full-Body Motor Tasks. Sensors (Basel). 2025;25(12):3736. doi: 10.3390/s25123736
24. Johnston W, Judice PB, Molina García P, et al. Recommendations for determining the validity of consumer wearable and smartphone step count: expert statement and checklist of the INTERLIVE network. Br J Sports Med. 2021;55(14):780-93. doi: 10.1136/bjsports-2020-103147
25. Courtney JB, Nuss K, Lyden K, et al. Comparing the activPAL software’s Primary Time in Bed Algorithm against Self-Report and van derBerg's Algorithm. Meas Phys Educ Exerc Sci. 2021;25(3):212-26. doi: 10.1080/1091367x.2020.1867146
26. Tinsley GM, Park KS, Saenz C, et al. Deuterium oxide validation of bioimpedance total body water estimates in Hispanic adults. Front Nutr. 2023;10:1221774. doi: 10.3389/fnut.2023.1221774
27. McCarthy C, Tinsley GM, Yang S, et al. Smartphone prediction of skeletal muscle mass: model development and validation in adults. Am J Clin Nutr. 2023;117(4):794-801. doi: 10.1016/j.ajcnut.2023.02.003
28. Katz MJ, Wang C, Nester CO, et al. T-MoCA: A valid phone screen for cognitive impairment in diverse community samples. Alzheimers Dement (Amst). 2021;13(1):e12144. doi: 10.1002/dad2.12144
29. Cheng X, Liu J, Wang Y, et al. Comparison of Students’ Physical Activity at Different Times and Establishment of a Regression Model for Smart Fitness Trackers. Sensors (Basel). 2025;25(6). doi: 10.3390/s25061726
30. Gutierrez NM, Cribbie R. Effect Sizes for Equivalence Testing: Incorporating the Equivalence Interval. Methods in Psychology. 2022;9:100127. doi: 10.31234/osf.io/5buz9
