Our senior project focused on designing a standardized data structure for wearable health data for Evidation Health.
I am a fifth year Manufacturing Engineering major, born in San Luis Obispo and raised in the area. My interests within engineering include healthcare and the environment, specifically medical devices and renewable energy. Outside of school I enjoy spending time with friends and family, and spending time outdoors. In the future I hope to use my degree to accomplish meaningful work.
I am a fourth year Industrial Engineering major. I am from Orange County, California, but have grown to call SLO my home. Within engineering I love to focus on cost analysis and making systems more efficient. In the future I hope to become a project manager and work with all different types of engineering fields.
I am a fourth year student from the Bay Area, majoring in Industrial Engineering with minors in Computer Science and Statistics. My interests within engineering include data science, data analytics and machine learning. Outside of school I love sports and spending time with my friends, family, and dogs. In the future I hope to complete a masters degree at Cal Poly and work to use data to solve problems.
We’d like to thank Evidation Health for providing us with the opportunity to work on this project, and our sponsors, Nicole Buechler and Filip Jankovic, for their guidance and assistance throughout this process. We would also like to thank our advisors, Tali Freed and Jill Speece, for their commitment to the senior project class.
Our Project's Videos
Our Project's Digital Poster
A product of Evidation is an app called Achievement, seen above in Figure 1.
Within the app:
-People connect the health and fitness device they are already using, & share their data with Achievement.
-Wearable data is combined with surveys to run studies that contribute to valuable research.
The standardized data structure created in this project will enable Evidation to reach more Achievement members.
-Individual scale: provides members with more valuable insights to improve their health.
-Population scale: more data for research efforts that contribute to improving the health of the overall population.
Since medical health data has gone digital, there has been a lack of standardization among wearable devices for managing the data. The goal of this project was to create a standard data model for biometric sensor data. This project focuses on the devices Fitbit, Apple Watch, and Oura ring, and the metrics sleep and step data. A successful standardization would make this data more efficient for analysis within companies, and in collaboration with other companies, and in turn better aid with understanding disease and human health (OHDSI, 2020).
Figure 2: Wearable Devices; Apple Watch, Fitbit, & Oura Ring
Figure 3 above shows the current state of Evidation’s process before our project.
Figure 4 above shows the process that raw data goes through to become analyzable for algorithm creation. This is Evidation’s final data process flow once we include our standardized data structure, which we have shown here as the highlighted steps.
This project is sponsored by Evidation Health
Design of Potential Solutions:
To include all of the data from each device in our final solution, essentially combining the three different data models into one with limited adjustments. (Minimal data processing, provides the data closest to its raw form).
To include all of the relevant and potentially useful data from each device, by eliminating unnecessary data attributes but still keeping most of the data relatively unchanged from its raw form. (Balance between keeping the data intact while still making it more usable for the data analyst).
To only include data categories that all three devices had in common, eliminating any categories that were not present in the data models from all three devices. (High data processing, Useful data left out).
Final Solution Design Analysis:
-Includes all of the relevant and potentially useful data , while eliminating unnecessary data attributes.
-This solution is a balance between leaving the data as intact as possible and processing the data enough to make it more usable for the data analyst.
-Deciding the units, format, title of columns, etc. were determined by using the most efficient calculations as possible.
-Embedded categories were transformed so the structure would be easier to use and analyze.
Verification and Validation:
-Provided with two weeks of sample data to use for our data structure and ETL tool development.
-Split into separate data sets. First week of data for training and developing our ETL tool and data structure. Second week of data was used to test and verify the final data structure.
-Outputs analyzed to confirm they have the desired accuracy, completeness, and overall format.
To validate the final data structure the following were sent to Evidation Health:
-The final data structure.
-A data dictionary including a clear spreadsheet of: column names, data types, and what device has that specific data. Seen in Figure 8 below.
-A set of instructions from our team that outlines how the data has been transformed; this is integrated into the code and seen in Figure 9.
-Confirmation that this standardized data structure was their desired end product.
-Changing the column names to be more user friendly.
-Confirmation that the structure would continue to be useful in the future when new versions of devices come out.
-Confirmation that the method selected for the data to be read into the structure is repeatable.
Evidation's Digital Measure Solution
Figure 10 above shows how our standardized structure fits into Evidation’s Digital Measure solution process.
A standardized data structure stands to have a huge benefit to public health. This diagram above shows the different areas in healthcare that wearable devices can play a part. These devices are potentially a cost-effective technique for surveillance, prevention, and management of acute and chronic disease.
In the future:
-New data sets for sleep and steps for Oura Ring, Apple Watch, and Fitbit can be continued to be extracted and transformed into the created standardized structure.
-The ETL scripts should continue to function even as the APIs of these devices change.
-Evidation will need to add additional columns, and update any syntax or naming convention changes.
This project focuses only on sleep and step data from Oura Ring, Apple Watch, and Fitbit, however these devices have many other features that can provide much more data to its wearers, as seen in Figure 12 above, that could also be analyzed in the future.
Top vendors in the global wearable technology market include:
-Apple, Fitbit, Oura, Whoop, Withings, Adidas Group, Sony Corporation, Qualcomm Technologies, Inc., and LG Electronics, Inc.
Figure 13: Wearable Devices; Whoop, Withings, & Adidas
Conclusions and Recommendations:
The raw data in JSON files were not ready to be read directly into spark, so a key aspect of this project was the development of python scripts to parse through the raw JSON files to split up, clean, reformat, and re-join all of the raw data into a ready to use format.
Aggregated Daily vs Activity Based Data Collection
There were two different data collection and data storage methods for activity data that added to the challenge of standardizing the data. To maintain as much data integrity and information as possible, the data for each device was transformed into two separate output structures to match each of the different styles.
To recreate a similar standardized data structure, the following basic steps should be followed:
Step 1: Flatten out raw data structures by removing nested data
Step 2: Use a spreadsheet to list each device and what they have in common, referencing the APIs
Step 3: Identify which categories are necessary, and which can be dropped
Step 4: Combine similar categories