Quote:
Originally Posted by denniswilliams
From 'Causal Inference and Discovery in Python' Aleksander Molak:
"Imagine you work at a research institute and you’re trying to understand the causes of people drowning.Your organization provides you with a huge database of socioeconomic variables. You decide to run a regression model over a large set of these variables to predict the number of drownings per day in your area of interest. When you check the results, it turns out that the biggest coefficient you obtained is for daily regional ice cream sales. Interesting! Ice cream usually contains large amounts of sugar, so maybe sugar affects people’s attention or physical performance while they are in the water.
This hypothesis might make sense, but before we move forward, let’s ask some questions. How about other variables that we did not include in the model? Did we add enough predictors to the model to describe all relevant aspects of the problem? What if we added too many of them? Could adding just one variable to the model completely change the outcome?"
|
I suspect you'd sell more ice cream on the hottest of days and that more people would be at the beach swimming on the hottest of days. Most beach areas have one or two ice cream stands in proximity. I would say yes something is probably missing.