Hume in the Light of Bayes: Towards a Unified Cognitive Science of Human Nature

David Hume hoped future advances would bring his nascent science of human nature nearer to perfection. He conjectured that since it is probable one operation and principle of the mind depends on another it might someday be possible to subsume all mental operations under one absolutely general and universal principle. I argue that the day Hume hoped for may now be upon us, as an increasing number of theorists suspect that predictive Bayesian models of cognitive architecture provide the best clue yet as to the shape of a unified theory of mind. A theory in which the manifold operations of perception, imagination, and belief formation may all be explained in terms of the same cognitive processes that implement causal inference. I show the fundamental mechanism behind such architectures, the notion of a generative model, comprises a functional analogue to Hume’s faculty of the imagination. I apply the ideas outlined to construct a theoretical model of speech perception which is able to explain perceptual illusions, such as phonemic restoration and Ganong effects as artifacts of Bayesian inference. If the unitary view spelled out in this paper is correct perception, action, understanding, and the imagination all co-emerge from the same neural and computational resources that implement error correction and causal inference. I thus conclude that predictive Bayesian models comprise a sound methodology with which to continue Hume’s project of constructing a genuinely unified cognitive science of human nature.


Introduction
David Hume hoped that future advances would bring his nascent science of human nature nearer to perfection, eventually carrying its researches beyond the mere classification of the operations of the mind to discover, at least in some degree, the secret springs and principles whereby such operations are actuated (EHU 1.15). 1 He conjectured that since, "it is probable that one operation and principle of the mind depends another" it might one day be possible to subsume all mental operations under one absolutely general and universal principle (EHU 1.15). It is my contention that the day Hume had hoped for may now be upon us as an increasing number of theorists (e.g., Clark, 2013;Dennett 2013;Hohwy, 2013) are coming to suspect that newly emergent predictive Bayesian models of cognitive architecture provide the best clue yet as to the shape of a unified theory of mind. A theory in which the manifold operations of perception, imagination, and belief formation may all be explained in terms of the same fundamental mechanisms that implement error correction and causal inference. In addition to comprising an exciting grand unified theory of cognition, I argue that the predictive processing paradigm also provides the computational and neuroscientific tools which allow us to formalize and further elucidate Hume's early insights. To state my point more colorfully (and controversially): just as it has been said that nothing in biology makes sense except in the light of evolution; I argue that virtually everything in Hume's empirical psychology starts to make clearer sense in the light of predictive processing. To convince the reader of this is the primary goal of this paper. If my analysis is correct, Hume should be acknowledged as having portended developments at the very forefront of modern cognitive science.

The Problem of Perception and the Bayesian Brain
Our sensory organs are under perpetual assault by an unrelenting barrage of environmental stimuli, yet instead of perceiving a barrage, we perceive a world. The problem of perception, therefore, is to comprehend how the brain accomplishes this feat. Advocates of the predictive processing model propose that the problem of perception reduces to the same kind of causal inference problems encountered in science and everyday life (Hohwy, 2013 p. 18). If this hypothesis is correct, then the situation stands very much as Hume argued centuries ago-ultimately all knowledge reduces to probability and is thus of the very same nature as the evidence we employ in common life (THN 1.4.1.4). According to the predictive processing model, the brain is essentially a prediction machine which implements unconscious Bayesian inference via an internal generative model that seeks to minimize prediction errors by matching incoming 'bottom-up' sensory stimuli with 'top-down' expectations (e.g., Clark, 2013Clark, , 2014Hohwy, 2013 p. 41). Perception in this view is thus co-emergent with a functional analogue of the imagination (Clark, 2014 p. 236). A conclusion, which if true, affirms Hume's corresponding assertion that the memory, senses, and understanding are all founded upon the imagination .

Hume and Helmholtz on Unconscious Inference
Though several philosophers have noted the resemblance to Hume's ideas (e.g., Hohwy, 2013;Dennett, 2013), the conceptual origins of predictive processing are most often traced to the physician and physicist Hermann von Helmholtz (Clark, 2013). It was clearly Helmholtz who first seized upon the idea, specifically, of the brain as a hypothesis-testing device thus paving the way for the contemporary elaboration of predictive processing in terms of cognitive neuroscience (Hohwy, 2013, p. 5). Hume chose to forego framing such implementation-level hypotheses, arguing that such a duty belonged "more to anatomists and natural philosophers," a duty to which Helmholtz qua physician was well suited (THN 1.1.2.1). Hume did, however, consider the possibility that sensory impressions might involve the imagination, and noted that inferences could be drawn from the coherence of our perceptions even when such perceptions are illusory (THN 1.3.5.2). It happened to be just such an illusion which inspired Helmholtz's key insight into the inferential nature of perception: I can recall when I was a boy going past the garrison chapel in Potsdam, where some people were standing in the belfry. I mistook them for dolls and asked my mother to reach up and get them for me, which I thought she could do. The circumstances were impressed on my memory, because it was by this mistake that I learned to understand the law of foreshortening in perspective (Helmholtz, 2013, p. 283).
This experience led him to conclude that the relationship between distance and size is not something passively received but something inferred based on one's prior experience, thus explaining why children are more prone to such errors (ibid.). This recognition of the central role of error-correction in unconscious inference constitutes Helmholtz's second great contribution to modern day predictive processing. Surprisingly however, Hume also remarked on many of the same phenomena a century prior to Helmholtz, and often in remarkably similar words: The judgement here corrects the inequalities of our internal emotions and perceptions; in like manner, as it preserves us from error, in the several variations of images, presented to our external senses. The same object, at a double distance, really throws on the eye a picture of but half the bulk; yet we imagine that it appears of the same size in both situations; because we know that on our approach to it, its image would expand on the eye, and that the difference consists not in the object itself, but in our position with regard to it (EPM p. 185-6).
Hume also highlighted the unconscious nature of such inferences and noted such judgments typically involved causal ascriptions, saying: "experience may produce a belief and a judgment of causes and effects by a secret operation, and without being once thought of" (THN 1.3.8.13). Helmholtz compared unconscious inferences to probabilistic syllogisms in which the major premises were composed of prior experiences and the minor premises of present sensory impressions (Helmholtz, 2013 p. 25). Contemporary researchers, however, have updated such ideas by forgoing Helmholtz's ingenious yet crude probabilistic syllogisms in favor of sophisticated Bayesian models (Dayan, et al., 1995). A treatment I argue is also well suited to Hume.

Hume in the Light of Bayes
Many philosophers (e.g., Mura, 1998 p. 306;Dennett, 2013), in addition to myself, have noticed a remarkable similarity between Hume's ideas and the subjectivist interpretation of Bayesian confirmation theory, which, like Hume, also construes probability in terms of degrees of partial belief (THN 1.3.12.17). Before I proceed to examine Hume's ideas in terms of Bayesian models of cognition it is important to forestall some tempting misunderstandings as to the nature of my endeavor. First, at no point am I suggesting that there was any direct influence of the work of the Reverend Thomas Bayes on Hume, neither am I arguing, as others have (e.g., Mura, 1998) that Hume's conception of probability is in fact Bayesian. My purpose is orthogonal to such debates, being merely to demonstrate the surprising consilience that results from a predictive Bayesian reading of Hume's science of human nature. Hume averred that life and action were entirely dependent on probabilistic measures of evidence (ABS 4). Yet despite this inherently uncertain state of affairs Hume attempted to provide a cogent account of rational belief revision: A wise man, therefore, proportions his belief to the evidence. [...] He considers which side is supported by the greater number of experiments: to that side he inclines, with doubt and hesitation; and when at last he fixes his judgement, the evidence exceeds not what we properly call probability [...] In all cases, we must balance the opposite experiments, where they are opposite, and deduct the smaller number from the greater, in order to know the exact force of the superior evidence (EHU 10.1.4).
Bayes' theorem, simply put, is an algebraic generalization embodying the optimal solution to the above problem: that of apportioning belief to evidence (Bayes, 1984). Rather than attempting to make sense of Hume's embryonic formulation, however, we moderns may instead carry on the spirit of his project with the mathematically optimal solution in hand. The canonical form of Bayes' rule is typically expressed by the following equation: The equation is read: for any hypothesis: A, and evidence or background knowledge: B: • P(A) is one's initial degree of belief in A, i.e., the "prior probability." • The quotient P(B|A)/P(B) represents the support B provides for A.
• P(A|B), on the left side represents the "posterior probability" i.e., one's new optimal degree of belief in A after taking into account the evidence in B for or against A.
By using Bayes' theorem to update one's belief in a prior hypothesis one arrives at a posterior probability representing the new optimal degree of belief given new evidence. This posterior, given further evidence, may itself serve as a prior for further iterations of the rule.

Predictive Projectivism
What I have been calling the predictive Bayesian model of cognitive architecture is currently referred to by many names, including: "the predictive mind" (Hohwy, 2013), "the Bayesian brain" (Dennett, 2013), predictive coding, and "action-oriented predictive processing" (Clark, 2013). However, as both Dennett (2013) and Richard Gregory (Gregory, 1997(Gregory, , p. 1121) have invoked the very Humean notion of projection in their characterizations of predictive processing, and Both Clark (2013) (Dennett, 2013). Perceptions in such a view, according to Gregory: "are regarded as similar to predictive hypotheses of science, but psychologically projected into external space and accepted as our most immediate reality" (Gregory, 1997(Gregory, p. 1121; italics mine). The result of this projection is what Dennett describes as "as a benign user-illusion" (2013). As odd as such a conclusion may sound, I contend that the phenomenon of phonemic restoration, to which I now turn, comprises a real world existence proof of how the error correcting mechanisms posited by both Hume and predictive processing engender benign, even adaptive, user illusions.

Perception as Controlled Hallucination
Our everyday speech is peppered with errors, including fitful starts and ambiguous articulations. The predictive processing model takes this idea to its logical conclusion, proclaiming that such ambiguity is the norm rather than the exception in all of our senses; therefore the entire purpose of the brain is error correction (Clark, 2014 p. 228;Hohwy, 2013 p. 46). Hume also championed the idea that sensation requires robust error correcting mechanisms: Experience soon teaches us this method of correcting our sentiments, or at least, of correcting our language, where the sentiments are more stubborn and inalterable [...] such corrections are common with regard to all the senses; and indeed 'twere it impossible we could never make use of language, or communicate our sentiments to one another, did we not correct the momentary appearances of things, and [monitor] our present situation (THN 3.3.1.16; emphasis mine).
According to Hume prior experience is continuously invoked to correct the errors of our senses, communications, and sentiments. I contend the following effect provides a concrete case of such correction in action.
When segments of speech are artificially edited to contain missing sounds, subjects typically notice something is missing. However, if these gaps are replaced with nonspeech sounds e.g., white noise or the sound of someone coughing, subjects typically report hearing the word as if the missing sound was present-thus hallucinating the missing phoneme (Traxler, 2012 p. 69). Moreover, instead of hearing the blast of noise in its true location, subjects instead perceive it as occurring just before or after the target word (Traxler, 2012 p. 69). This phenomenon in which our brains infer and project the sounds we expect to hear based on the evidence of the sounds present is known as phonemic restoration (ibid.). Phonemic restoration is thus a phenomenon that appears to validate the slogan popular in predictive processing circles: "perception is controlled hallucination" (Clark, 2014 p. 236). Similar mechanisms are also likely behind the Ganong effect: the tendency to hear ambiguous speech sounds as actual words (Ganong, 1980). What an agent perceives given ambiguous data thus appears to depend heavily upon their "top-down" prior knowledge rather than merely upon the detection of "bottom-up" features present in the acoustic signal itself (Clark, 2013 p. 187).
I next endeavor to demonstrate the utility of the predictive processing approach by showing how it enables the ideas of Hume and Helmholtz to be formalized to construct a theoretical model of perception able to account for the phonemic restoration as an artifact of Bayesian perceptual inference.

The Humean Model of Projective Inference
In predictive processing models perception may be likened to a systematic gambling scheme in which our brains are constantly placing bets upon the nature of the worldly causes of experience, when our bets are correct, we perceive the world (Clark, 2013).
As the phonemic restoration effect shows however, merely perceiving the world is not enough if the incoming data is noisy and incomplete. Thus we often must supplement and correct our perceptions with top-down guesses of what the world is probably like. The predictive processing model proposes that such corrections are continuously carried out by the brain via unconscious Bayesian inference.
The predictive projectivist paradigm provides the computational means to formalize and further elucidate a surprising number of Humean themes into a coherent theoretical model of cognition. I call the result the Humean Model of Projective Inference, the basic architecture of which is stunningly simple: it is composed of a generative model, which functions as an analogue of Hume's faculty of the imagination, the primary function of which is the implementation of Bayesian causal inference and the minimization of prediction error. I will elaborate on each of these elements in turn.
A generative model, as Andy Clark explains: attempts to capture the causal structure of a source domain by systematically recapitulating the statistical relations responsible for that very structure (Clark, 2013 p. 182). A hierarchical generative model of speech perception for example might begin by a using word level 'hypotheses' to attempt to predict the probability of sequences at the phoneme level. The hierarchical aspect of such models dictates that the same process operating at the phoneme level also attempt to predict the signals arriving from the still even lower level of acoustic perturbations. Lastly, all of these hypotheses are continuously updated-via Bayes' rule-at every level as new evidence, in the form of prediction error signals, propagates up the neural hierarchy. This model can thus be seen to perform a Strange Inversion upon our intuitive assumption that speech is recognized from the bottom up.
In this Humean model of perception memory, senses, and the understanding, are all founded on the imagination (THN 1.4.7.3) as the architecture of generative models requires that the network layer at level N+1 predict, using prior knowledge, the activity at layer N, which must itself be capable of generating the sense-data as represented at layer N-1. Higher-level units attempt to minimize prediction error signals which propagate up the neural hierarchy until they are "explained away" and accommodated (Clark, 2014 p. 236). What an agent perceives thus depends heavily upon the set of prior beliefs its brain deploys in its best attempts to predict current sensory perturbations (Clark, 2013 p. 187).
The problem of speech perception when cast in terms of Bayesian inference thus can be reduced to selecting and revising of hypotheses about the causes of one's auditory perturbations. Or to cast things in Humean terms: on apportioning our beliefs to the evidence. Let us denominate each of these possibilities a hypothesis (h). Bayes' rule tells us to revise the probability of a given hypothesis P(h) (e.g., that a particular phoneme is present) given some evidence (e) (i.e., the present noises) by multiplying likelihood P(e/h)-the probability of e given the presence of h, by the prior probability of the hypothesis. By doing so we arrive the posterior probability: P(h/e).
Bayes' rule states that one's optimal inference is to infer the hypothesis with the highest posterior (Hohwy, 2013 p. 16-17). Recall, however, that in predictive processing architectures such inferences are implemented by way of a generative model, which is constantly striving to minimize prediction error by reducing the discrepancies between predicted and actual sensory input. When this discrepancy is effectively eliminated the incoming information is 'explained away' which, intuitively, corresponds to being less 'surprised' by the given evidence (Hohwy, 2013 p. 46). I contend this amounts to a formalization of Hume's concordant emphasis upon the cognitive importance of surprise, in remarking: "What is natural and essential to any thing is, in a manner, expected; and what is expected makes less impression, and appears of less moment, than what is unusual and extraordinary" (THN 1.4.6.14). This intuitive fact explains results noted by Mirman, et al., that listeners are most likely to restore a missing phoneme when there is some evidence (e.g., noise) for the presence of the phoneme, because both the conditional probability of the acoustic information and the context reinforce the inference (2005). Likewise, listeners are also more likely to restore missing phonemes, and to do so more quickly, when the acoustic signature of the replacing noise resembles the missing phoneme (ibid.). Phonemic restoration occurs as a consequence of the conclusions one derives which are heavily influenced by the quality of one's prior beliefs and confidence in the given hypothesis. Thus, while Bayesian reasoning is optimal in principle, in practice there is no guarantee that any particular attempt at Bayesian inference will not lead to error. The moral to be drawn from this is that: "all optimal inference is Bayesian, but not all Bayesian inference is optimal" (Clark, 2014 p. 231 Phonemic restoration is thus explained in the Humean Model of Projective Inference by the fact that we don't expect spoken words to have random phonemes deleted and replaced with white noise or the sound of someone coughing. Thus, pace Hume, in such ambiguous situations our predictive imagination projects the expected phoneme right where expected. However, it only does so when the evidence does not rule out the contrary possibility, such as when the space left from the deleted phoneme is left blank-encountering silent gaps mid-word is surprising, and we perceive it so. If the above model is correct, it requires that perception as it occurs in animals such as us is co-emergent with a functional analogue to Hume's faculty of the imagination (Clark, 2014 p. 236). As noted by Andy Clark, a creature that can perceive using topdown resources is therefore well poised to also endogenously generate a virtual version of its percepts from the top down (Clark, 2014 p. 236). For Hume of course, the imagination was no mere engine of whimsy, but an indispensable mental faculty essential for implementing the probabilistic causal inferences and associative operations essential to all cognition, action, and belief formation (ABS 4; THN 1.4.7.3; 1.4.1.4). Furthermore, Hume also explicitly described the predictive nature of the imagination, charging that: "the fancy anticipates the course of things, and surveys the object in that condition, to which it tends, as well as in that, which is regarded as the present" (THN 2.3.7.9).
However, as Hume noted, the labored deductions of the understanding are too slow and error prone to engender adaptive action and perception, thus the inferential judgements by which our beliefs regarding the probability of causes are updated and revised must be implemented automatically and unconsciously (THN 1.3.8.13; EHU 9.5). Therefore, rather than wasting time consciously contemplating every possibility, the imagination makes its judgements automatically without reflection and projects these expectations on to the external world-except, as Dennett remarks, it doesn't actually project its conclusions literally outside one's head as from a slide projector (Dennett, 2013). Rather, when our guesses are close enough we simply perceive a world, replete with the expected properties.

Toward a Unified Cognitive Science of Human Nature
Hume conjectured that it is probable that all of our mental operations result from one absolutely general and universal underlying principle (EHU 1.15). If the predictive processing account advocated in this paper is correct, the universal principle sought by Hume is that of prediction error minimization, and its implementation is via a hierarchical Bayesian generative model. I have shown that such models offer a unitary account of perception, imagination, causal inference, and belief revision; the explanatory fecundity of predictive processing, however, extends further, even to encompass action and understanding.
Karl Friston has pioneered the notion of active inference theorizing that bodily movement originates via a kind of 'self-fulfilling prophecy' in which our generative model predicts the proprioceptive consequences of an intended action which, if these intended sensations do not obtain, results in prediction-errors which are then corrected by moving the body in such a way as to cause the expected sensations to occur (Clark, 2014 p. 237). Chris Eliasmith elaborates the idea: "the best ways of interpreting incoming information via perception, are deeply the same as the best ways of controlling outgoing information via motor action" (Eliasmith 2007, p. 380). Furthermore, according to Andy Clark, a theory of understanding emerges as a natural consequence of this action-oriented predictive processing. To successfully perceive and interact with the world in this predictive inferential manner, argues Clark, just is to understand a great deal about how the world is, and thereby to embody a working understanding of the kinds of entities, properties, and events that populate it (Clark, 2014 p. 236).
If the unitary view spelled out in this paper is correct perception, action, understanding, and the imagination all co-emerge from the same neural and computational resources that implement error correction and causal inference. I thus conclude that predictive Bayesian models comprise a sound methodology with which to continue Hume's project of constructing a genuinely unified cognitive science of human nature.