Reflections

Principles for Life from Data Science
November 2025
Micah Melling

I published this book in 2021 after three years of on-and-off work. Four years later, I am still proud of the result. Most of it has aged well, largely because the principles of effective data science are immutable, in my opinion. That said, I am under no impression this book is perfect. As I have learned more, I can see parts that would benefit from some brushing up (if I can ever dedicate the time). I am not certain how much of an impact this resource has had or will have. A few times a year, I hear from someone who has found it useful, which is enough for me.

Earlier this year, I reached 10 years in data science. As I reflect on my time in the field, of which this book is a major component, I am struck by how the principles of data science have impacted my life. In fact, that is the most striking part about my career: how it has influenced how I view the world, for the better.

This entry provides a short discussion on life lessons derived from data science principles. It is meant to be concise and to inspire the reader to research and reflect more. My hope is that the applications to life are self-evident and thought-provoking.

I have no idea how widely this piece will be read, but I am taking the leap of faith to put these ideas out “in the wild.” After all, isn’t that the point of data science and of life for that matter? To influence the world in some way, hopefully positively?

Data Science Principles and Crossovers to Life

At the core, data science is about requiring good evidence for our beliefs and actions. We should have high standards for what we believe.

In data science, we build models that will make errors. In our minds, we also make world models that produce errors. We have to accept the reality of erring, and we must importantly understand what types of errors matter and the impact they might have. Not all predictions are of equal importance and consequence.

Variance is part of the world, which is henceforth reflected in data we collect and analyze. We might be lucky or unlucky due to the roll of the dice. Extreme favor or disfavor can occur due to randomness. Do not be fooled by randomness.

Building predictive models should breed a sense of humility. As Yogi Berra said, “prediction is hard, especially about the future.” Uncertainty is a part of our world. Grappling with uncertainty, by recognizing and attempting to quantify it, is a sign of maturity. When uncertainty is high, we are perhaps best served by not making any predictions.

False attribution is a major problem in models, especially regression models that aim to explain how one variable influences another. We should keep our heads on a swivel to identify omitted variable bias. Likewise, there are confounding variables, and our good friend randomness. We should always be wary of clear-cut explanations about the world. The world is rarely simple, and one-size-fits all explanations can often be misleading. Hold space for complexity, nuance, and paradox.

With classification models, the predicted likelihood is more important than the predicted outcome (i.e., are we 51% sure this will happen or 98% certain?). Calibrating our beliefs is challenging but important work. Nuance matters in prediction and decision making.

Models often struggle to extrapolate to previously unseen data. We must recognize when we are in sufficiently uncharted territory and have a suitable fallback plan in the interim along with a gameplan for succeeding in new waters. We must detect and respond to “drift”.

Data distributions, such as the normal distribution, are prevalent in data science. In all situations, we would be wise to consider what type of distribution is perhaps governing outcomes. How rare is a rare event? How extreme can events get? What type of “insurance policy” and risk management is needed? Likewise, remember that the outcome we experienced was one possible outcome of the distribution; it was perhaps not destined to occur but rather was one of many possibilities influenced by randomness. If a similar situation repeats in the future, the next outcome will be another sample from the distribution. If we get a similar outcome as before, that may be because of nothing more than randomness. If it was a one-shot situation, consider what might have happened in a parallel universe where another point of the distribution may have been sampled. Remember the distribution.

Data can be manipulated and subject to inconsistent collection. Visualizations can be misleading. Analyses can be shoehorned into predefined takeaways. Humans bring biases to their work. Consequently, we must be skeptical of analytical work but not so skeptical as to become paralyzed. Often, we should not look for the “right” answer but for helpful and solid evidence to incrementally make us “less wrong than before”.

Just because something can be done doesn’t mean that it should be done or will perform well. There are plenty of bad models out there that looked convincing as demos. A gap exists between “doing” and “doing well.”

Knowing what is being optimized is important. In data science, we often build models that optimize a single metric related to predictive power. Optimizing and balancing dual objectives can be challenging. Constrained optimization can become computationally intractable. How do we move beyond single-minded objectives? If we know what objective is being optimized, what are the resulting unintended consequences and related harms? How can we leave room for balance and exploration? We must consider such queries.

Working in data science changes our perspective to emphasize what a system cannot do in addition to what it can do. Knowing and communicating limitations is critical. A healthy dose of skepticism is important for creating realistic expectations.

We need to build things that can grow, to create optionality, and to make reversible decisions. Data science systems interact with the real world, which can change rapidly and without warning or slowly and almost imperceptibly. We, therefore, should not lock ourselves into singular patterns. We need systems and models that can easily grow over time and be changed in quick order when necessary. Rigidity is not a virtue.

Data science blends theory and action. We need contemplation paired with practical solutions. We need to think about what to do, the why behind it, the implications of the work, and if we are OK with those implications. Data science is a contact sport, but we need a plan, one that embodies an underlying philosophy of the game.

Data science requires both versatility and deep knowledge. We must be at least conversational in many areas and possess deep technical skill in some.

People look to data scientists to make important decisions. We should take this level of influence seriously. We must act with integrity, humility, and honesty. We should be technically sound, so that decisions are made on solid grounds. We should know when to say “no”, “I don’t know”, and “we can’t tell”. We should not give into pressures to arrive at predefined answers or to provide insights when none exist. We must acknowledge bias, limitations, and assumptions. We must seek the truth regardless of the associated PR.

Data science encourages us to always be learning, to embody curiosity, and to think independently. It’s a never-ending process, one where we must strive to be increasingly better in multiple regards. At the same time, we can still look at our work and reap satisfaction. Most importantly, if done well, we can positively impact those around us.