++
Learning Objectives
After completing this chapter, the reader will be able to:
Describe how relational databases collate and store data.
Explain how Structured Query Language (SQL) is used to interact with relational databases to select and filter data.
Illustrate data analysis techniques and methodology used to evaluate data sets.
Define the differences between “little data” and “big data.”
Define the “5 V’s” of big data.
Explain the concept of a data warehouse and its relation to reporting and analytics.
Explain the differences between and limitations of predictive analytics and machine learning.
Describe basic concepts of different machine learning methodologies.
Describe examples of machine learning used in health care, as well as potential future directions for machine learning in health care.
++
Key Concepts
A relational database is a type of database that collects and stores data in structured formats called tables that are related to one another.
SQL is a comprehensive, text-based language that allows the user to define and manipulate data in relational databases using four primary operators: projection, filter, join, and aggregate.
Data analysis is methodically taking large amounts of raw data from databases and aggregating it in such a way to reveal hidden, meaningful information.
“Little data” stems from patients and end users, whereas “big data” is an aggregation of data from all sources of little data.
Big data is characterized by the “5 V’s”: volume, velocity, variety, veracity, and value.
Analysis of big data typically requires use of data warehouses for reporting and analytical methods.
The ultimate goal of using big data is to take raw data from sources, such as electronic medical records, and translate that to make meaningful decisions or provide insight or intelligence to change the scope of patient care for current and future patients.
Machine learning differs from a predictive analytics model, in that machine learning will change its decision dynamically in a programmatic fashion utilizing different statistical methodologies, such as logistic regression, and is considered to be a new frontier in the way that patient care will unfold within the next few decades.
++
The advent and increasing adoption of the electronic health records (EHR) has led to an explosion in the quantity of patient data. With this, there is an increased need for subject matter experts trained in the use, extraction, and analysis of data generated from an EHR. These experts can assist clinicians in providing information to make data-based decisions. Along with the need of proper data extraction and analysis, the amount of data available to lead to changes in the health care landscape is unprecedented, yet questions remain about how to use this data in a reliable way to conduct research and make clinical and operational decisions. This chapter will cover various aspects of data reporting, data management, data mining, artificial intelligence, and the ultimate end-use of this vast amount ...