Make sure to read this chapter carefully: The tools presented here are the basic building blocks for all of our future work in this book. We will cover how to describe data, how to begin to compare data, how to look at data over time, and some fundamental principles of the display of data. Each of these steps are the first steps toward more complex data analytics presented in subsequent chapters. Every time we move from a single piece of data to some method of summarizing a collection of data, we make assumptions. If our assumptions are wrong, the truth may be lost in the noise. Even worse, if we make inappropriate assumptions at this level, in our building-block summary statistics and preliminary comparisons, these errors will get compounded when we begin to employ the kind of larger and more complex models described in the later chapters of this part of the book.
The first step in any analysis is to look at the data. This cannot be emphasized enough. You need to plot the data, and look at them. Look at the raw data. Examine your data, before you do any more-sophisticated analysis. Looking at the data will shine a light on outliers, defined as values in the data that are very different and more extreme than others. Sometimes outliers are real, like a length of stay of 220 days or a systolic blood pressure of 60 mmHg. Sometimes outliers are errors in recording, like a true length of stay of 22 days (but recorded as 220 days) or a systolic blood pressure of 160 mmHg (but recorded as 60 mmHg). Looking at the data themselves and understanding in detail how they were collected, as discussed in Chapter 12, can help us understand what might lead to the data we see. Do we need to correct the data or change our assumptions?
Look at your data, with your own eyes and in their rawest form. It is tempting to skip this step, but don't. You'll catch many problems that get obscured by later analyses.
Perhaps more important is that the findings in our data—the truth in what lies in the numbers—are rarely discovered just with more advanced analytics. Rather, more advanced analytics usually help confirm the findings of simpler analyses. In the previous chapter, we discussed confounding, the concept that a third variable has a relationship with both our exposure and our outcome. When we account for confounding in our analyses, most often our findings become attenuated. For example, if we are interested in the effect of gender on the risk of ischemic heart disease, we must make sure to take age into consideration. But even before we do that, if there really is a difference between men and women, our comparison of rates of ischemic heart disease between the two groups make this clear. The truth ...