Austerweil Lab

bayesian nonparametrics categorization computational psychiatry creativity feature inference fluency natural language processing networks norms pedagogy perception rational analysis reasoning reinforcement learning statistical methods

Show all papers.

2025

URL

Abstract

Austerweil, J. L., Sanborn, A. N., Lucas, C., & Griffiths, T. L. (2025). In T. L. Griffiths, N. Chater, & J. B. Tenenbaum (Ed.), Bayesian Models of Cognition: Reverse Engineering the Mind (pp. 1-24). Cambridge, MA: MIT Press.

Tags: bayesian nonparametrics

2024

URL

Abstract

Austerweil, J. L., Liew, S. X., Conway, N., & Kurtz, K. J. (2024). Computational Brain and Behavior, , 1-35.

Tags: categorization contrast category learning

The ability to generate new concepts and ideas is among the most fascinating aspects of human cognition, but we do not have a strong understanding of the cognitive processes and representations underlying concept generation. Previous work in this domain has focused on how the statistical structure of known categories generalizes to generated categories, overlooking whether (and if so, how) contrast between the known and generated categories is a factor. In this paper, we explore a different factor: contrast from known categories. We propose two novel approaches to modeling category contrast: one focused on exemplar dissimilarity and another based on the representativeness heuristic. Across three behavioral experiments, we nd that people generate new categories that contrast from observed categories and distribute exemplars acoss unoccupied regions of stimulus space. The model based on the representativeness heuristic captured human category generation better when the known category was well captured by a Gaussian distribution. Conversely, the exemplar-based mode

2023

URL

Abstract

Zemla, J. C., Gooding, D. C., & Austerweil, J. L. (2023). Evidence for optimal semantic search throughout adulthood Nature: Scientific Communications, 13, 22528.

Tags: memory retrieval networks fluency aging computational psychiatry

As people age, they learn and store new knowledge in their semantic memory. Despite learning a tremendous amount of information, people can still recall information relevant to the current situation with ease. To accomplish this, the mind must efficiently organize and search a vast store of information. It also must continue to retrieve information effectively despite changes in cognitive mechanisms due to healthy aging, including a general slowing in information processing and a decline in executive functioning. How effectively does the mind of an individual adjust its search to account for changes due to aging? We tested 746 people ages 25 through 69 on a semantic fluency task (free listing animals) and found that, on average, retrieval follows an optimal path through semantic memory. Participants tended to list a sequence of semantically related animals (e.g., lion, tiger, puma) before switching to a semantically unrelated animal (e.g., whale). We found that the timing of these transitions to semantically unrelated animals was remarkably consistent with an optimal strategy for maximizing the overall rate of retrieval (i.e., the number of animals listed per unit time). Age did not affect an individual’s deviation from the optimal strategy given their general performance, suggesting that people adapt and continue to search memory optimally throughout their lives. We argue that this result is more likely due to compensating for a general slowing than a decline in executive functioning

2021

URL Abstract	Ho, M. K., Cushman, F. A., Littman, M. L., & Austerweil, J. L. (2021). Communication in Action: Belief-directed Planning and Pragmatic Action Interpretation in Communicative Demonstrations Journal of Experimental Psychology: General, 13, 2246-2272. Tags: communication problem solving pragmatics planning social learning Theory of mind enables an observer to interpret others’ behavior in terms of unobservable beliefs, desires, intentions, feelings, and expectations about the world. This also empowers the person whose behavior is being observed: By intelligently modifying her actions, she can influence the mental representations that an observer ascribes to her, and by extension, what the observer comes to believe about the world. That is, she can engage in intentionally communicative demonstrations. Here, we develop a computational account of generating and interpreting communicative demonstrations by explicitly distinguishing between two interacting types of planning. Typically, instrumental planning aims to control states of the environment, whereas belief-directed planning aims to influence an observer’s mental representations. Our framework extends existing formal models of pragmatics and pedagogy to the setting of value-guided decision-making, captures how people modify their intentional behavior to show what they know about the reward or causal structure of an environment, and helps explain data on infant and child imitation in terms of literal versus pragmatic interpretation of adult demonstrators’ actions. Additionally, our analysis of belief-directed intentionality and mentalizing sheds light on the socio-cognitive mechanisms that underlie distinctly human forms of communication, culture, and sociality.
URL Abstract	Malle, B. F., Austerweil, J. L., Chi, V. B., Kenett, Y. N., Beck, E. D., Thapa, S., & Allaham, M. M. (2021). Cognitive Properties of Norm Representations Proceedings of the 43rd Annual Meeting of the Cognitive Science Society (pp. 1-7). Tags: social norms moral norms negation cognitive structure network deontics Norms are central to social life. They help people select actions that benefit the community and facilitate behavior prediction and coordination. However, little is known about the cognitive properties of norms. Here we focus on norm activation, context specificity, and how those properties differ for the two major types of norms: prescriptions and prohibitions. In two studies, participants are exposed to a variety of contexts by way of scene images and either (a) freely generate norms that apply to the context or (b) decide whether each of a series of candidate norms applies to a given context. Across both studies, people showed high levels of context specificity and fast norm activation, and these properties were substantially stronger for prescriptions than for prohibitions.
URL Abstract	Sanborn, A. N., Heller, K., Austerweil, J. L., & Chater, N. (2021). REFRESH: A New Approach to Modeling Dimensional Biases in Perceptual Similarity and Categorization Psychological Review, 128, 1145-1186. Tags: categorization separable dimensions family resemblance Bayesian models Much categorization behavior can be explained by family resemblance: New items are classified by comparison with previously learned exemplars. However, categorization behavior also shows a variety of dimensional biases, where the underlying space has so-called “separable” dimensions: Ease of learning categories depends on how the stimuli align with the separable dimensions of the space. For example, if a set of objects of various sizes and colors can be accurately categorized using a single separable dimension (e.g., size), then category learning will be fast, while if the category is determined by both dimensions, learning will be slow. To capture these dimensional biases, almost all models of categorization supplement family resemblance with either rule-based systems or selective attention to separable dimensions. But these models do not explain how separable dimensions initially arise; they are presumed to be unexplained psychological primitives. We develop, instead, a pure family resemblance version of the Rational Model of Categorization (RMC), which we term the Rational Exclusively Family RESemblance Hierarchy (REFRESH), which does not presuppose any separable dimensions in the space of stimuli. REFRESH infers how the stimuli are clustered and uses a hierarchical prior to learn expectations about the variability of clusters across categories. We first demonstrate the dimensional alignment of natural-category features and then show how through a lifetime of categorization experience REFRESH will learn prior expectations that clusters of stimuli will align with separable dimensions. REFRESH captures the key dimensional biases and also explains their stimulus-dependence and how they are learned and develop.

2020

URL Abstract	Mohanta, S., Afrasiabi,, M., Casey, C., Tanabe, S., Redinbaugh, M. J., Kambi,, N. A., Phillips, J. M., Polyakov, D., Filbey, W., Austerweil, J. L., Sanders,, R. D., & Saalmann, Y. B. (2020). Receptors, circuits and neural dynamics for prediction bioRxiv, , xx-xx. Tags: reinforcement learning Learned associations between stimuli allow us to model the world and make predictions, crucial for efficient behavior; e.g., hearing a siren, we expect to see an ambulance and quickly make way. While theoretical and computational frameworks for prediction exist, circuit and receptor-level mechanisms are unclear. Using high density EEG and Bayesian modeling, we show that trial history and frontal alpha activity account for reaction times (a proxy for predictions) on a trial by trial basis in an audio visual prediction task. Low dose ketamine, a NMDA receptor blocker, but not the control drug dexmedetomidine perturbed predictions, their representation in frontal cortex, and feedback to posterior cortex. This study suggests predictions depend on frontal alpha activity and NMDA receptors, and ketamine blocks access to learned predictive information.
URL Abstract	Zemla, J. C., Cao, K., Mueller, K. D., & Austerweil, J. L. (2020). SNAFU: The Semantic Network and Fluency Utility Behavior Research Methods, , 1-19. Tags: fluency networks memory retrieval methodology The verbal fluency task — listing words from a category or words that begin with a specific letter — is a common experimental paradigm that is used to diagnose memory impairments and to understand how we store and retrieve knowledge. Data from the verbal fluency task are analyzed in many different ways, often requiring manual coding that is time intensive and error-prone. Researchers have also used fluency data from groups or individuals to estimate semantic networks—latent representations of semantic memory that describe the relations between concepts—that further our understanding of how knowledge is encoded. However computational methods used to estimate networks are not standardized and can be difficult to implement, which has hindered widespread adoption. We present SNAFU: the Semantic Network and Fluency Utility, a tool for estimating networks from fluency data and automatizing traditional fluency analyses, including counting cluster switches and cluster sizes, intrusions, perseverations, and word frequencies. In this manuscript, we provide a primer on using the tool, illustrate its application by creating a semantic network for foods, and validate the tool by comparing results to trained human coders using multiple datasets.
URL Abstract	Chuang, Y. S., Hubbard, E. M., & Austerweil, J. L. (2020). The “Fraction Sense” Emerges from a Deep Convolutional Neural Network Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (pp. xx-xx). Toronto, Canada: Cognitive Science Society. Tags: deep convolutional neural network emergent sense of number ratio-processing system approximate number system Fractions are a critical building block for the development of human mathematical cognition, but the origins of this concept are not well-understood. Recent work has found that a whole number sense is present in deep convolutional neural networks (DCNNs) pre-trained for object recognition and uses them as a model for investigating human numerical cognition. Do DCNNs also have a fraction sense? If so, is it dependent or independent of whole number processing? We investigated the neural sensitivity of a pretrained DCNN to both whole numbers and fractions. We replicated and extended previous research that the sense of whole number emerges in a different DCNN architecture. Further, we showed that DCNN is also sensitive to fraction value, i.e., the ratio of numerosities. Testing this model, our results suggest that the fraction sense relies on the whole number sense.
URL Abstract	Lange, K. V., Hopman, E. W, M., Zemla, J. C., & Austerweil, J. L. (2020). Evidence against a relation between bilingualism and creativity PLoS ONE, 15(6), 1-18. Tags: multilingualism creativity language network analysis language acquisition learning Are bilinguals more creative than monolinguals? Some prior research suggests bilinguals are more creative because the knowledge representations for their second language are similarly structured to those of highly creative people. However, there is contrasting research showing that the knowledge representations of bilinguals’ second language are actually structured like those of less creative people. Finally, there is growing skepticism about there being differences between bilinguals and monolinguals on non-language tasks (e.g., the bilingual advantage for executive control). We tested whether bilinguals tested in their second language are more or less creative than both monolinguals and bilinguals tested in their first language. Participants also took a repeated semantic fluency test that we used to estimate individual semantic networks for each participant. We analyzed our results with Bayesian statistics and found support for the null hypothesis that bilingualism offers no advantage for creativity. Further, using best practices for estimating semantic networks, we found support for the hypothesis that there is no association between an individual’s semantic network and their creativity. This is in contrast with published research, and suggests that some of those findings may have been the result of idiosyncrasies, outdated methods for estimating semantic networks, or statistical noise. Our results call into question reported relations between bilingualism and creativity, as well as semantic network structure as an explanatory mechanism for individual differences in creativity.

2019

URL Abstract	Ho, M. K., Cushman, F. A., Littman, M. L., & Austerweil, J. L. (2019). A Rational-Pragmatic Account of Communicative Demonstrations PsyArXiv, 19, 1-86. Tags: communication problem solving pragmatics planning social learning Theory of mind enables an observer to interpret others’ behavior in terms of unobservable beliefs, desires, intentions, feelings, and expectations about the world. This also empowers the person whose behavior is being observed: By intelligently modifying her actions, she can influence the mental representations that an observer ascribes to her, and by extension, what the observer comes to believe about the world. That is, she can engage in intentionally communicative demonstrations. Here, we develop a computational account of generating and interpreting communicative demonstrations by explicitly distinguishing between two interacting types of intentions. Object-directed intentions aim to control states of the physical environment, whereas belief-directed intentions aim to influence an observer’s mental representations. This formulation provides a number of theoretical insights. In particular, our framework (1) extends existing formal models of pragmatics and pedagogy to the setting of value-guided decision-making, (2) captures how people modify their intentional behavior to show what they know about the reward or causal structure of an environment, and (3) helps explain data on infant and child imitation in terms of differential attribution to adult demonstrators’ object-directed and belief-directed intentions. Additionally, our analysis of belief-directed intentionality and mentalizing helps shed light on the socio-cognitive mechanisms that underlie distinctly human forms of communication, culture, and sociality.
URL Abstract	Ho, M. K., Cushman, F. A., Littman, M. L., & Austerweil, J. L. (2019). People Teach with Rewards and Punishments as Communication not Reinforcements Journal of Experimental Psychology: General, 148(3), 520-549. Tags: pedagogy reward punishment reinforcement learning communication Carrots and sticks motivate behavior, and people can teach new behaviors to other organisms, such as children or nonhuman animals, by tapping into their reward learning mechanisms. But how people teach with reward and punishment depends on their expectations about the learner. We examine how people teach using reward and punishment by contrasting two hypotheses. The first is evaluative feedback as reinforcement, where rewards and punishments are used to shape learner behavior through reinforcement learning mechanisms. The second is evaluative feedback as communication, where rewards and punishments are used to signal target behavior to a learning agent reasoning about a teacher’s pedagogical goals. We present formalizations of learning from these 2 teaching strategies based on computational frameworks for reinforcement learning. Our analysis based on these models motivates a simple interactive teaching paradigm that distinguishes between the two teaching hypotheses. Across three sets of experiments, we find that people are strongly biased to use evaluative feedback communicatively rather than as reinforcement.
URL Abstract	Zemla, J. C., & Austerweil, J. L. (2019). Analyzing Knowledge Retrieval Impairments Associated with Alzheimer’s Disease Using Network Analyses Complexity, 2019, 1-12. Tags: networks fluency alzheimers computational psychiatry A defining characteristic of Alzheimer’s disease is difficulty in retrieving semantic memories, or memories encoding facts and knowledge. While it has been suggested that this impairment is caused by a degradation of the semantic store, the precise ways in which the semantic store is degraded are not well understood. Using a longitudinal corpus of semantic fluency data (listing of items in a category), we derive semantic network representations of patients with Alzheimer’s disease and of healthy controls. We contrast our network-based approach with analyzing fluency data with the standard method of counting the total number of items and perseverations in fluency data. We find that the networks of Alzheimer’s patients are more connected and that those connections are more randomly distributed than the connections in networks of healthy individuals. These results suggest that the semantic memory impairment of Alzheimer’s patients can be modeled through the inclusion of spurious associations between unrelated concepts in the semantic store. We also find that information from our network analysis of fluency data improves prediction of patient diagnosis compared to traditional measures of the semantic fluency task.
URL Abstract	Liew, S. X., & Austerweil, J. L. (2019). Novel categories are distinct from "Not"-categories In A. Goel, C. Seifert, & C. Freksa (Ed.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (pp. xx-xx). Montreal, Quebec, Canada: Cognitive Science Society. Tags: categorization category generation contrast category learning The categorization literature often considers two types of categories as equivalent: (a) standard categories and (b) negation categories. For example, category learning studies typically conflate learning categories A and B with learning categories A and NOT A. This study represents the first attempt at delineating these two separate types of generated categories. We specifically test for differences in the distributional structure of generated categories, demonstrating that categories identified as not what was known are larger and wider-spread compared to categories that were identified with a specific label. We also observe consistency in distributional structure across multiple generated categories, replicating and extending previous findings. These results are discussed in the context of providing a foundation for future modeling work.
URL Abstract	Afrasiabi, M., Orr, M. G., & Austerweil, J. L. (2019). Evaluating Theories of Collaborative Cognition Using the Hawkes Process and a Large Naturalistic Data Set In A. Goel, C. Seifert, & C. Freksa (Ed.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (pp. xx-xx). Montreal, Quebec, Canada: Cognitive Science Society. Tags: collaborative cognition Hawkes process organizational psychology bayesian nonparametrics People spontaneously collaborate to solve a common goal. What factors affect whether teams are successful? Due to lack of large-scale naturalistic data and methods for investigating scientific questions on such data, previous work has either focused on very concrete cases, such as surveys of business teams, or abstract cases, such as GridWorld games, where agents coordinate their movement so that each agent can get to their own goal without obstructing other agents. We propose a computational framework based on the multivariate Hawkes process and a novel algorithm for parameter estimation on large data sets. We demonstrate the potential of this method by applying it to a large database of programming teams, public GitHub repositories. We analyze factors known to influence team performance, such as leader organization style and team cognitive diversity, as well as other factors, such as the burstiness of effort, that are difficult to test using existing methods.
URL Abstract	Payton, M., Zemla, J. C., & Austerweil, J. L. (2019). Subjective Randomness in a Non-cooperative Game In A. Goel, C. Seifert, & C. Freksa (Ed.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (pp. xx-xx). Montreal, Quebec, Canada: Cognitive Science Society. Tags: randomness pattern recognition opponent modeling Rock, Paper, Scissors (RPS) is a competitive game. There are three actions: rock, paper, and scissors. The game’s rules are simple: scissors beats paper, rock beats scissors and paper beats rock (all signs stalemate against themselves). Over multiple games with the same opponent, optimal play according to a Nash Equilibrium requires subjects to play with genuine randomness. To examine randomness judgments in the context of competition, we tested subjects with identical sequences in two conditions: one produced from a dice roll, one from someone playing rock, paper, scissors. We compared these findings to models of subjective randomness from Falk and Konold (1997) and from Griffiths and Tenenbaum (2001), which explain assessments of randomness as a function of algorithmic complexity and statistical inference, respectively. In both conditions the models fail to adequately describe subjective randomness judgements of ternary outcomes. We also observe that context influences perceptions of randomness such that some isomorphic sequences produced from intentional play are perceived as less random than dice rolls. We discuss this finding in terms of the relation between patterns and opponent modeling.
URL Abstract	Austerweil, J. L., Sanborn, S., & Griffiths, T. L. (2019). Learning How to Generalize Cognitive Science, , xx-xx. Tags: generalization inductive inference bayesian modeling category learning Generalization is a fundamental problem solved by every cognitive system in essentially every domain. Although it is known that how people generalize varies in complex ways depending on the context or domain, it is an open question how people learn the appropriate way to generalize for a new context. To understand this capability, we cast the problem of learning how to generalize as a problem of learning the appropriate hypothesis space for generalization. We propose a normative mathematical framework for learning how to generalize by learning inductive biases for which properties are relevant for generalization in a domain from the statistical structure of features and concepts observed in that domain. More formally, the framework predicts that an ideal learner should learn to generalize by either taking the weighted average of the results of generalizing according to each hypothesis space, with weights given by how well each hypothesis space fits the previously observed concepts, or by using the most likely hypothesis space. We compare the predictions of this framework to human generalization behavior with three experiments in one perceptual (rectangles) and two conceptual (animals and numbers) domains. Across all three studies we find support for the framework’s predictions, including individual-level support for averaging in the third study.
URL Abstract	Austerweil, J. L., Liew, S. X., Conway, N., & Kurtz, K. J. (2019). Creating Something Different: Similarity, Contrast, and Representativeness in Categorization PsyArXiv, , xx-xx. Tags: categorization contrast category learning The ability to generate new concepts and ideas is among the most fascinating aspects of human cognition, but we do not have a strong understanding of the cognitive processes and representations underlying concept generation. In this paper, we study the generation of new categories using the computational and behavioral toolkit of traditional artificial category learning. Previous work in this domain has focused on how the statistical structure of known categories generalizes to generated categories, overlooking whether (and if so, how) contrast between the known and generated categories is a factor. We report three experiments demonstrating that contrast between what is known and what is created is of fundamental importance for categorization. We propose two novel approaches to modeling category contrast: one focused on exemplar dissimilarity and another on the representativeness heuristic. Our experiments and computational analyses demonstrate that both models capture different aspects of contrast’s role in categorization.

2018

URL Abstract	Fathan, M. I., Renfro, E. J., Austerweil, J. L., & Beckage, N. M. (2018). Do Humans Navigate via Random Walks? Modeling Navigation in a Semantic Word Game. In C. Kalish, M. Rau, T. Rogers, & J. Zhu (Ed.), Proceedings of the 40th Annual Meeting of the Cognitive Science Society (pp. xx-xx). Austin, TX: Cognitive Science Society. Tags: networks reasoning We investigate a method for formulating context and taskspecific computational models of human performance in a constrained semantic memory task. In particular, we assume that memory retrieval can only use a simple process – a random walk – and examine whether the effect of context and task specifications can be captured via a straightforward network estimation method that is sensitive to context and task. We find that a random walk model on the context-specific networks mimics aggregate human performance.
URL Abstract	Hopman, E. W. M., Thompson, B., Austerweil, J. L., & Lupyan, G. (2018). Predictors of L2 word learning accuracy: A big data investigation. In C. Kalish, M. Rau, T. Rogers, & J. Zhu (Ed.), Proceedings of the 40th Annual Meeting of the Cognitive Science Society (pp. xx-xx). Austin, TX: Cognitive Science Society. Tags: pedagogy statistical methods What makes some words harder to learn than others in a second language? Although some robust factors have been identified based on small scale experimental studies, many relevant factors are difficult to study in such experiments due to the amount of data necessary to test them. Here, we investigate what factors affect the ease of learning of a word in a second language using a large data set of users learning English as a second language through the Duolingo mobile app. In a regression analysis, we test and confirm the well-studied effect of cognate status on word learning accuracy. Furthermore, we find significant effects for both cross-linguistic semantic alignment and English semantic density, two novel predictors derived from large scale distributional models of lexical semantics. Finally, we provide data on several other psycholinguistically plausible word level predictors. We conclude with a discussion of the limits, benefits and future research potential of using big data for investigating second language learning.
URL Abstract	Ho, M. K., Littman, M. L., Cushman, F., & Austerweil, J. L. (2018). Effectively Learning from Pedagogical Demonstrations. In C. Kalish, M. Rau, T. Rogers, & J. Zhu (Ed.), Proceedings of the 40th Annual Meeting of the Cognitive Science Society (pp. xx-xx). Austin, TX: Cognitive Science Society. Tags: pedagogy reasoning reinforcement learning When observing others’ behavior, people use Theory of Mind to infer unobservable beliefs, desires, and intentions. And when showing what activity one is doing, people will modify their behavior in order to facilitate more accurate interpretation and learning by an observer. Here, we present a novel model of how demonstrators act and observers interpret demonstrations corresponding to different levels of recursive social reasoning (i.e. a cognitive hierarchy) grounded in Theory of Mind. Our model can explain how demonstrators show others how to perform a task and makes predictions about how sophisticated observers can reason about communicative intentions. Additionally, we report an experiment that tests (1) how well an observer can learn from demonstrations that were produced with the intent to communicate, and (2) how an observer’s interpretation of demonstrations influences their judgments.
URL Abstract	Cochrane, A., Simmering, V., Austerweil, J. L., & Green, C. S. (2018). Rapid Learning in Early Attentional Processing: Bayesian Estimation of Trial-by-Trial Updating. In C. Kalish, M. Rau, T. Rogers, & J. Zhu (Ed.), Proceedings of the 40th Annual Meeting of the Cognitive Science Society (pp. xx-xx). Austin, TX: Cognitive Science Society. Tags: perception statistical methods All agents must constantly learn from dynamic environments to optimize their behaviors. For instance, it is necessary in new environments to learn how to distribute attention – i.e., which stimuli are relevant, and thus should be selected for greater processing, and which are irrelevant, and should be suppressed. Despite this, many experiments implicitly assume that attentional control is a static process (by averaging performance over large blocks of trials). By developing and utilizing new statistical tools, here we demonstrate that the effect of flanking items on response times to a central item (often utilized as an index of attentional control) is systematically and continuously influenced through time by the statistics of the flanking items. We discuss the implications of this finding from the perspective of examining individual differences – where traditional data analysis approaches may confound the rate at which attentional filtering changes through time with the asymptotic ability to filter.
URL Abstract	Zemla, J. C., & Austerweil, J. L. (2018). Estimating semantic networks of groups and individuals from fluency data. Computational Brain and Behavior, X, 1-23. Tags: networks fluency statistical methods One popular and classic theory of how the mind encodes knowledge is an associative semantic network, where concepts and associations between concepts correspond to nodes and edges, respectively. A major issue in semantic network research is that there is no consensus among researchers as to the best method for estimating the network of an individual or group. We propose a novel method (U-INVITE) for estimating semantic networks from semantic fluency data (listing items from a category) based on a censored random walk model of memory retrieval. We compare this method to several other methods in the literature for estimating networks from semantic fluency data. In simulations, we find that U-INVITE can recover semantic networks with low error rates given only a moderate amount of data. U-INVITE is the only known method derived from a psychologically plausible process model of memory retrieval and one of two known methods that we found to be consistent estimators of this process: if semantic memory retrieval is consistent with this process, the procedure will eventually estimate the true network (given enough data). We conduct the first exploration of different methods for estimating psychologically-valid semantic networks by comparing people’s similarity judgments of edges estimated by each network estimation method. To encourage best practices, we discuss the merits of each network estimation technique, provide a flow chart that assists with choosing an appropriate method, and supply code for others to employ these techniques on their own data.

2017

URL Abstract	Zemla, J. C., & Austerweil, J. L. (2017). Modeling semantic fluency data as search on a semantic network. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Ed.), Proceedings of the 39th annual meeting of the cognitive science society (pp. X-X). Austin, TX: Cognitive Science Society. Tags: networks fluency Psychologists have used the semantic fluency task for decades to gain insight into the processes and representations underlying memory retrieval. Recent work has suggested that a censored random walk on a semantic network resembles semantic fluency data because it produces optimal foraging. However, fluency data have rich structure beyond being consistent with optimal foraging. Under the assumption that memory can be represented as a semantic network, we test a variety of memory search processes and examine how well these processes capture the richness of fluency data. The search processes we explore vary in the extent they explore the network globally or exploit local clusters, and whether they are strategic. We found that a censored random walk with a priming component best captures the frequency and clustering effects seen in human fluency data.
URL Abstract	Sarathy, V., Scheutz, M., Kenett, Y., Allaham, M. M., & Austerweil, J. L. (2017). Mental Representations and Computational Modeling of Context-Specific Human Norm Systems In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Ed.), Proceedings of the 39th annual meeting of the cognitive science society (pp. X-X). Austin, TX: Cognitive Science Society. Tags: norms fluency Human behavior is frequently guided by social and moral norms; in fact, no societies, no social groups could exist without norms. However, there are few cognitive science approaches to this central phenomenon of norms. While there has been some progress in developing formal representations of norm systems (e.g., deontological approaches), we do not yet know basic properties of human norms: how they are represented, activated, and learned. Further, what computational models can capture these properties, and what algorithms could learn them? In this paper we describe initial experiments on human norm representations in which the context specificity of norms features prominently. We then provide a formal representation of norms using Dempster-Shafer Theory that allows a machine learning algorithm to learn norms under uncertainty from these human data, while preserving their context specificity.
URL Abstract	Conaway, N., & Austerweil, J. L. (2017). PACKER: An exemplar model of category generation. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Ed.), Proceedings of the 39th annual meeting of the cognitive science society (pp. X-X). Austin, TX: Cognitive Science Society. Tags: creativity categorization Generating new concepts is an intriguing yet understudied topic in cognitive science. In this paper, we present a novel exemplar model of category generation: PACKER (Producing Alike and Contrasting Knowledge using Exemplar Representations). PACKER's core design assumptions are (1) categories are represented as exemplars in a multidimensional psychological space, (2) generated items should be similar to exemplars of the same category, and (3) generated categories should be dissimilar to existing categories. A behavioral study reveals strong effects of contrast and target-class similarity. These effects are novel empirical phenomena, which are directly predicted by the PACKER model but are not explained by existing formal approaches.
URL Abstract	Ren, J., & Austerweil, J. L. (2017). Interpreting asymmetric perception in speech processing with Bayesian inference. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Ed.), Proceedings of the 39th annual meeting of the cognitive science society (pp. X-X). Austin, TX: Cognitive Science Society. Tags: perception rational analysis This paper proposes a Bayesian account of asymmetries found in speech perception: In many languages, listeners show greater sensitivity if a non-coronal sound (/b/, /p/, /g/, /k/) is changed to coronal sounds (/d/, /t/) than vice versa. The currently predominant explanation for these asymmetries is that they reflect innate constraints from Universal Grammar. Alternatively, we propose that the asymmetries could simply arise from optimal inference given the statistical properties of different speech categories of the listener’s native language. In the framework of Bayesian inference, we examined two statistical parameters of coronal and non-coronal sounds: frequencies of occurrence and variance in articulation. In the languages in which perceptual asymmetries have been found, coronal sounds are either more frequent or more variable than non-coronal sounds. Given such differences, an ideal observer is more likely to perceive a non-coronal speech signal as a coronal segment than vice versa. Thus, the perceptual asymmetries can be explained as a natural consequence of probabilistic inference. The coronal/non-coronal asymmetry is similar to asymmetries observed in many other cognitive domains. Thus, we argue that it is more parsimonious to explain this asymmetry as one of many similar asymmetries found in cognitive processing, rather than a linguistic-specific, innate constraint.
URL Abstract	Ho, M. K., Littman, M. L., & Austerweil, J. L. (2017). Teaching by intervention: Working backwards, undoing mistakes, or correcting mistakes? In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Ed.), Proceedings of the 39th annual meeting of the cognitive science society (pp. X-X). Austin, TX: Cognitive Science Society. Tags: reinforcement learning pedagogy When teaching, people often intentionally intervene on a learner while it is acting. For instance, a dog owner might move the dog so it eats out of the right bowl, or a coach might intervene while a tennis player is practicing to teach a skill. How do people teach by intervention? And how do these strategies interact with learning mechanisms? Here, we examine one global and two local strategies: working backwards from the end-goal of a task (backwards chaining), placing a learner in a previous state when an incorrect action was taken (undoing), or placing a learner in the state they would be in if they had taken the correct action (correcting). Depending on how the learner interprets an intervention, different teaching strategies result in better learning. We also examine how people teach by intervention in an interactive experiment and find a bias for using local strategies like undoing.
URL Abstract	Austerweil, J. L., Griffiths, T. L., & Palmer, S. E. (2017). Learning to Be (In)variant: Combining Prior Knowledge and Experience to Infer Orientation Invariance in Object Recognition. Cognitive Science, 41(5), 1183-1201. Tags: perception bayesian nonparametrics categorization How does the visual system recognize images of a novel object after a single observation despite possible variations in the viewpoint of that object relative to the observer? One possibility is comparing the image with a prototype for invariance over a relevant transformation set (e.g., translations and dilations). However, invariance over rotations (i.e., orientation invariance) has proven difficult to analyze, because it applies to some objects but not others. We propose that the invariant transformations of an object are learned by incorporating prior expectations with real world evidence. We test this proposal by developing an ideal learner model for learning invariance that predicts better learning of orientation dependence when prior expectations about orientation are weak. This prediction was supported in two behavioral experiments, where participants learned the orientation dependence of novel images using feedback from solving arithmetic problems.

2016

URL Abstract	Kleiman-Weiner, M., Ho, M. K., Austerweil, J. L., Littman, M. L., & Tenenbaum, J. B. (2016). Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Ed.), Proceedings of the 38th annual meeting of the cognitive science society (pp. 1679-1684). Austin, TX: Cognitive Science Society. Tags: reinforcement learning Successfully navigating the social world requires reasoning about both high-level strategic goals, such as whether to cooperate or compete, as well as the low-level actions needed to achieve those goals. We develop a hierarchical model of social agency that infers the intentions of other agents, strategically decides whether to cooperate or compete with them, and then executes either a cooperative or competitive planning program. Learning occurs across both high-level strategic decisions and low-level actions leading to the emergence of social norms. We test predictions of this model in multi-agent behavioral experiments using rich video-game like environments. By grounding strategic behavior in a formal model of planning, we develop abstract notions of both cooperation and competition and shed light on the computational nature of joint intentionality.
URL Abstract	Kenett, Y. N., & Austerweil, J. L. (2016). Examining search processes in low and high creative individuals with random walks. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Ed.), Proceedings of the 38th annual meeting of the cognitive science society (pp. 313-318). Austin, TX: Cognitive Science Society. Tags: networks creativity The creative process involves several cognitive processes, such as working memory, controlled attention and task switching. One other process is cognitive search over semantic memory. These search processes can be controlled (e.g., problem solving guided by a heuristic), or uncontrolled (e.g., mind wandering). However, the nature of this search in relation to creativity has rarely been examined from a formal perspective. To do this, we use a random walk model to simulate uncontrolled cognitive search over semantic networks of low and high creative individuals with an equal number of nodes and edges. We show that a random walk over the semantic network of high creative individuals “finds” more unique words and moves further through the network for a given number of steps. Our findings are consistent with the associative theory of creativity, which posits that the structure of semantic memory facilitates search processes to find creative solutions.
URL Abstract	Ho, M. K., Littman, M. L., MacGlashan, J., Cushman, F., & Austerweil, J. L. (2016). Showing versus doing: Teaching by demonstration. In D. D. Lee, M. Sugiyama , U. V. Luxburg, I. Guyon, & R. Garnett (Ed.), Advances in neural information processing systems (pp. 3027-3035). Red Hook, NY: Curran Associates, Inc. Tags: reinforcement learning pedagogy People often learn from others' demonstrations, and classic inverse reinforcement learning (IRL) algorithms have brought us closer to realizing this capacity in machines. In contrast, teaching by demonstration has been less well studied computationally. Here, we develop a novel Bayesian model for teaching by demonstration. Stark differences arise when demonstrators are intentionally teaching a task versus simply performing a task. In two experiments, we show that human participants systematically modify their teaching behavior consistent with the predictions of our model. Further, we show that even standard IRL algorithms benefit when learning from behaviors that are intentionally pedagogical. We conclude by discussing IRL algorithms that can take advantage of intentional pedagogy.
URL Abstract	Ho, M. K., MacGlashan, J., Hilliard, E., Trimbach, C., Brawner, S., Gopalan, N., Greenwald, A., Littman, M. L., Tenenbaum, J. B., Kleiman-Weiner, M., & Austerweil, J. L. (2016). Feature-based joint planning and norm learning in collaborative games. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Ed.), Proceedings of the 38th annual meeting of the cognitive science society (pp. 1158-1163). Austin, TX: Cognitive Science Society. Tags: reinforcement learning norms People often use norms to coordinate behavior and accomplish shared goals. But how do people learn and represent norms? Here, we formalize the process by which collaborating individuals (1) reason about group plans during interaction, and (2) use task features to abstractly represent norms. In Experiment 1, we test the assumptions of our model in a gridworld that requires coordination and contrast it with a “best response” model. In Experiment 2, we use our model to test whether group members’ joint planning relies more on state features independent of other agents (landmark-based features) or state features determined by the configuration of agents (agent-relative features).
URL Abstract	Zemla, J. C., Kenett, Y. N., Jun, K., & Austerweil, J. L. (2016). U-INVITE: Estimating individual semantic networks from fluency data. In A. Papafragou, D. Grodner, D. Mirman, & J. Trueswell (Ed.), Proceedings of the 38th annual meeting of the cognitive science society (pp. 1907-1912). Austin, TX: Cognitive Science Society. Tags: networks fluency Semantic networks have been used extensively in psychology to describe how humans organize facts and knowledge in memory. Numerous methods have been proposed to construct semantic networks using data from memory retrieval tasks, such as the semantic fluency task (listing items in a category). However these methods typically generate group-level networks, and sometimes require a very large amount of participant data. We present a novel computational method for estimating an individual’s semantic network using semantic fluency data that requires very little data. We establish its efficacy by examining the semantic relatedness of associations estimated by the model.
URL Abstract	Austerweil, J. L., Brawner, S., Greenwald, A., Hilliard, E., Ho, M., Littman, M. L., MacGlashan, J., & Trimbach, C. (2016). The impact of other-regarding preferences in a collection of non-zero-sum grid games. AAAI spring symposium 2016 on challenges and opportunities in multiagent learning for the real world . Palo Alto, CA: The AAAI Press. Tags: reinforcement learning We examined the behavior of reinforcement-learning algorithms in a set of two-player stochastic games played on a grid. These games were selected because they include both cooperative and competitive elements, highlighting the importance of adaptive collaboration between the players. We found that pairs of learners were surprisingly good at discovering stable mutually beneficial behavior when such behaviors existed. However, the performance of learners was significantly impacted by their other-regarding preferences. We found similar patterns of results in games involving human–human and human–agent pairs.
URL Abstract	Sobel, D. M., & Austerweil, J. L. (2016). Coding choices affect the analyses of a false belief measure. Cognitive Development, 40, 9-23. Tags: statistical methods The unexpected contents task is a ubiquitous measure of false belief. Not only has this measure been used to study children’s developing knowledge of belief, it has impacted the study of atypical development, education, and many other facets of cognitive development. Based on a review of articles using this task, we show that there is no consensus regarding how to score this measure. Further, examining both a logit analysis of performance on this measure and performance of a large sample of preschoolers, we show that which coding scheme researchers used to analyze raw data from this measure has a reliable effect on results, particularly when smaller sample sizes are used. Integrating our results, we conclude that the most frequently used coding scheme is flawed. We recommend best practices for scoring the unexpected contents task, and that researchers examine how they analyze data from this measure to ensure the robustness of their effects.
URL Abstract	Cibelli, E., Xu, Y., Austerweil, J. L., Griffiths, T. L., & Reiger, T. (2016). The Sapir-Whorf hypothesis and probabilistic inference: Evidence from the domain of color. PLOS ONE, 11(7), e0158725. Tags: perception The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

2015

URL Abstract	Qian, T., & Austerweil, J. L. (2015). Learning additive and substitutive features. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Ed.), Proceedings of the 37th annual meeting of the cognitive science society (pp. 1919-1924). Austin, TX: Cognitive Science Society. Tags: feature inference To adapt in an ever-changing world, people infer what basic units should be used to form concepts. Recent computational models of representation learning have successfully predicted how people discover features (Austerweil & Griffiths, 2013), however, the learned features are assumed to be additive. This assumption is not always true in the real world. Sometimes a basic unit is substitutive (Garner, 1978) - for example, a cat is either furry or hairless, but not both. Here we explore how people form representations for substitutive features, and what computational principles guide such behavior. In an experiment, we show that not only are people capable of forming substitutive feature representations, but they also infer whether a feature should be additive or substitutive depending on the input. This learning behavior is predicted by our novel extension to the Austerweil and Griffiths (2011, 2013)’s feature construction framework, but not their original model.
URL Abstract	Ho, M. K., Littman, M. L., Cushman, F., & Austerweil, J. L. (2015). Teaching with rewards and punishments: Reinforcement or communication? In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, & P. P. Maglio (Ed.), Proceedings of the 37th annual meeting of the cognitive science society (pp. 920-925). Austin, TX: Cognitive Science Society. Tags: reinforcement learning pedagogy Teaching with evaluative feedback involves expectations about how a learner will interpret rewards and punishments. We formalize two hypotheses of how a teacher implicitly expects a learner to interpret feedback – a reward-maximizing model based on standard reinforcement learning and an action-feedback model based on research on communicative intent – and describe a virtual animal-training task that distinguishes the two. The results of two experiments in which people gave learners feedback for isolated actions (Exp. 1) or while learning over time (Exp. 2) support the action-feedback model over the reward-maximizing model.
URL	Austerweil, J. L. (2015). Contradictory “heuristic” theories of autism spectrum disorders: The case for theoretical precision using computational models. Autism, 19(3), 367-368. Tags: computational psychiatry
URL Abstract	Austerweil, J. L., Gershman, S. J., Tenenbaum, J. B., & Griffiths, T. L. (2015). Structure and flexibility in Bayesian models of cognition. In J. R. Busemeyer, Z. Wang, J. T. Townsend, & A. Eidels (Ed.), Oxford handbook of computational and mathematical psychology (pp. 187-208). New York, NY: Oxford University Press. Tags: bayesian nonparametrics Probability theory forms a natural framework for explaining the impressive success of people at solving many difficult inductive problems, such as learning words and categories, inferring the relevant features of objects, and identifying functional relationships. Probabilistic models of cognition use Bayes’ rule to identify probable structures or representations that could have generated a set of observations, whether the observations are sensory input or the output of other psychological processes. In this chapter we address an important question that arises within this framework: How do people infer representations that are complex enough to faithfully encode the world but not so complex that they “overfit” noise in the data? We discuss nonparametric Bayesian models as a potential answer to this question. To do so, first we present the mathematical background necessary to understand nonparametric Bayesian models. We then delve into nonparametric Bayesian models for three types of hidden structure: clusters, features, and functions. Finally, we conclude with a summary and discussion of open questions for future research.
URL Abstract	Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015). Random walks on semantic networks can resemble optimal foraging. Psychological Review, 122(3), 558-569. Tags: networks fluency When people are asked to retrieve members of a category from memory, clusters of semantically related items tend to be retrieved together. A recent article by Hills, Jones, and Todd (2012) argued that this pattern reflects a process similar to optimal strategies for foraging for food in patchy spatial environments, with an individual making a strategic decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that similar behavioral phenomena also emerge from a random walk on a semantic network derived from human word-association data. Random walks provide an alternative account of how people search their memories, postulating an undirected rather than a strategic search process. We show that results resembling optimal foraging are produced by random walks when related items are close together in the semantic network. These findings are reminiscent of arguments from the debate on mental imagery, showing how different processes can produce similar results when operating on different representations.
URL Abstract	Cohen-Priva, U., & Austerweil, J. L. (2015). Analyzing the history of Cognition using topic models. Cognition, 135, 4-9. Tags: natural language processing Very few articles have analyzed how cognitive science as a field has changed over the last six decades. We explore how Cognition changed over the last four decades using Topic Models. Topic Models assume that every word in every document is generated by one of a limited number of topics. Words that are likely to co-occur are likely to be generated by a single topic. We find a number of significant historical trends: the rise of moral cognition, eyetracking methods, and action, the fall of sentence processing, and the stability of development. We introduce the notion of framing topics, which frame content, rather than present the content itself. These framing topics suggest that over time Cognition turned from abstract theorizing to more experimental approaches.
URL Abstract	Prinzmetal, W., Whiteford, K., Austerweil, J. L., & Landau, A. N. (2015). Spatial attention and environmental information. Journal of Experimental Psychology: Human, Perception, & Performance, 41(5), 1396-1408. Tags: perception Navigating through our perceptual environment requires constant selection of behaviorally relevant information and irrelevant information. Spatial cues guide attention to information in the environment that is relevant to the current task. How does the amount of information provided by a location cue and irrelevant information influence the deployment of attention and what are the processes underlying this effect? To address these questions, we used a spatial cueing paradigm to measure the relationship between cue predictability (measured in bits of information) and the voluntary attention effect, the benefit in reaction time (RT) because of cueing a target. We found a linear relationship between cue predictability and the attention effect. To analyze the cognitive processes producing this effect, we used a simple RT model, the Linear Ballistic Accumulator model. We found that informative cues reduced the amount of evidence necessary to make a response (the threshold), regardless of the presence of irrelevant information (i.e., distractors). However, a change in the rate of evidence accumulation occurred when distractors were present in the display. Thus, the mechanisms underlying the deployment of attention are exquisitely tuned to the amount and behavioral relevancy of statistical information in the environment.
URL Abstract	Malle, B. F., Scheutz, M., & Austerweil, J. L. (2015). Networks of social and moral norms in human and artificial agents. In M. I. A. Ferreira, J. S. Sequeira, O. T. Mohammad, E. E. Kadar, & G. S. Virk (Ed.), International conference on robot ethics (pp. 3-17). Cham, Switzerland: Springer International Publishing. Tags: norms The most intriguing and ethically challenging roles of robots in society are those of collaborator and social partner. We propose that such robots must have the capacity to learn, represent, activate, and apply social and moral norms—they must have a norm capacity. We offer a theoretical analysis of two parallel questions: what constitutes this norm capacity in humans and how might we implement it in robots? We propose that the human norm system has four properties: flexible learning despite a general logical format, structured representations, context-sensitive activation, and continuous updating. We explore two possible models that describe how norms are cognitively represented and activated in context-specific ways and draw implications for robotic architectures that would implement either model.

2014

URL

Abstract

Austerweil, J. L. (2014). Testing the psychological validity of cluster construction biases. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Ed.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (pp. 122-127). Austin, TX: Cognitive Science Society.

Tags: categorization

To generalize from one experience to the next in a world where the underlying structures are ever-changing, people construct clusters that group their observations and enable information to be pooled within a cluster in an efficient and effective manner. Despite substantial computational work describing potential domain-general processes for how people construct these clusters, there has been little empirical progress comparing different proposals to each other and to human performance. In this article, I empirically test some popular computational proposals against each other and against human behavior using the Markov chain Monte Carlo with People methodology. The results support two popular Bayesian nonparametric processes, the Chinese Restaurant Process and the related Dirichlet Process Mixture Model.

2013

URL

Abstract

Jia, Y., Abbott, J. T., Austerweil, J. L., Griffiths, T. L., & Darrell, T. (2013). Visual concept learning: Combining machine vision and bayesian generalization on concept hierarchies. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Ed.), Advances in neural information processing systems (pp. 1842-1850).

Tags: categorization

Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization.

URL

Abstract

Austerweil, J. L., & Griffiths, T. L. (2013). A nonparametric Bayesian framework for constructing flexible feature representations. Psychological Review, 120(4), 817-851.

Tags: feature inference bayesian nonparametrics

Representations are a key explanatory device used by cognitive psychologists to account for human behavior. Understanding the effects of context and experience on the representations people use is essential, because if two people encode the same stimulus using different representations, their response to that stimulus may be different. We present a computational framework that can be used to define models that flexibly construct feature representations (where by a feature we mean a part of the image of an object) for a set of observed objects, based on nonparametric Bayesian statistics. Austerweil and Griffiths (2011) presented an initial model constructed in this framework that captures how the distribution of parts affects the features people use to represent a set of objects. We build on this work in three ways. First, although people use features that can be transformed on each observation (e.g., translate on the retinal image), many existing feature learning models can only recognize features that are not transformed (occur identically each time). Consequently, we extend the initial model to infer features that are invariant over a set of transformations, and learn different structures of dependence between feature transformations. Second, we compare two possible methods for capturing the manner that categorization affects feature representations. Finally, we present a model that learns features incrementally, capturing an effect of the order of object presentation on the features people learn. We conclude by considering the implications and limitations of our empirical and theoretical results.

2012

URL Abstract	Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2012). Constructing a hypothesis space from the web for large-scale bayesian word learning. In N. Miyake, D. Peebles, & R. P. Cooper (Ed.), Proceedings of the 34th annual meeting of the cognitive science society (pp. 54-59). Austin, TX: Cognitive Science Society. Tags: statistical methods The Bayesian generalization framework has been successful in explaining how people generalize a property from a few observed stimuli to novel stimuli, across several different domains. To create a successful Bayesian generalization model, modelers typically specify a hypothesis space and prior probability distribution for each specific domain. However, this raises two problems: the models do not scale beyond the (typically small-scale) domain that they were designed for, and the explanatory power of the models is reduced by their reliance on a hand-coded hypothesis space and prior. To solve these two problems, we propose a method for deriving hypothesis spaces and priors from large online databases. We evaluate our method by constructing a hypothesis space and prior for a Bayesian word learning model from WordNet, a large online database that encodes the semantic relationships between words as a network. After validating our approach by replicating a previous word learning study, we apply the same model to a new experiment featuring three additional taxonomic domains (clothing, containers, and seats). In both experiments, we found that the same automatically constructed hypothesis space explains the complex pattern of generalization behavior, producing accurate predictions across a total of six different domains.
URL Abstract	Griffiths, T. L., Austerweil, J. L., & Berthiaume, V. G. (2012). Comparing the inductive biases of simple neural networks and Bayesian models. In N. Miyake, D. Peebles, & R. P. Cooper (Ed.), Proceedings of the 34th annual meeting of the cognitive science society (pp. 402-407). Austin, TX: Cognitive Science Society. Tags: networks Understanding the relationship between connectionist and probabilistic models is important for evaluating the compatibility of these approaches. We use mathematical analyses and computer simulations to show that a linear neural network can approximate the generalization performance of a probabilistic model of property induction, and that training this network by gradient descent with early stopping results in similar performance to Bayesian inference with a particular prior. However, this prior differs from distributions defined using discrete structure, suggesting that neural networks have inductive biases that can be differentiated from probabilistic models with structured representations.
URL Abstract	Griffiths, T. L., & Austerweil, J. L. (2012). Bayesian generalization with circular consequential regions. Journal of Mathematical Psychology, 56(4), 281-285. Tags: categorization Generalization–deciding whether to extend a property from one stimulus to another stimulus–is a fundamental problem faced by cognitive agents in many different settings. Shepard (1987) provided a mathematical analysis of generalization in terms of Bayesian inference over the regions of psychological space that might correspond to a given property. He proved that in the unidimensional case, where regions are intervals of the real line, generalization will be a negatively accelerated function of the distance between stimuli, such as an exponential function. These results have been extended to rectangular consequential regions in multiple dimensions, but not for circular consequential regions, which play an important role in explaining generalization for stimuli that are not represented in terms of separable dimensions. We analyze Bayesian generalization with circular consequential regions, providing bounds on the generalization function and proving that this function is negatively accelerated.
URL Abstract	Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2012). Human memory search as a random walk in a semantic network. In F. Pereira, C.J.C. Burges, L. Bottou, & K.Q. Weinberger (Ed.), Advances in neural information processing systems (pp. 3041-3049). Red Hook, NY: Curran Associates, Inc. Tags: networks fluency The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters.

2011

URL Abstract	Austerweil, J. L., & Griffiths, T. L. (2011). Seeking confirmation is rational for deterministic hypotheses. Cognitive Science, 35(3), 499-526. Tags: reasoning The tendency to test outcomes that are predicted by our current theory (the confirmation bias) is one of the best-known biases of human decision making. We prove that the confirmation bias is an optimal strategy for testing hypotheses when those hypotheses are deterministic, each making a single prediction about the next event in a sequence. Our proof applies for two normative standards commonly used for evaluating hypothesis testing: maximizing expected information gain and maximizing the probability of falsifying the current hypothesis. This analysis rests on two assumptions: (a) that people predict the next event in a sequence in a way that is consistent with Bayesian inference; and (b) when testing hypotheses, people test the hypothesis to which they assign highest posterior probability. We present four behavioral experiments that support these assumptions, showing that a simple Bayesian model can capture people’s predictions about numerical sequences (Experiments 1 and 2), and that we can alter the hypotheses that people choose to test by manipulating the prior probability of those hypotheses (Experiments 3 and 4).
	Austerweil, J. L., & Griffiths, T. L. (2011). Human feature learning. In N. M. Seel (Ed.), Encyclopedia of the sciences of learning (pp. 1456-1458). New York, NY: Springer. Tags: feature inference
URL Abstract	Austerweil, J. L., Friesen, A. L., & Griffiths, T. L. (2011). An ideal observer model for identifying the reference frame of objects. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Ed.), Advances in neural information processing systems (pp. 514-522). Red Hook, NY: Curran Associates, Inc. Tags: rational analysis perception The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols × and + differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conflicting reference frames, the model predicts two factors should influence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object (proximity) and it should be more likely to share the reference frame containing the most objects (alignment). We confirm people use both cues using a novel methodology that allows for easy testing of human reference frame inference.
URL Abstract	Austerweil, J. L., & Griffiths, T. L. (2011). A rational model of the effects of distributional information on feature learning. Cognitive Psychology, 63, 173-209. Tags: rational analysis feature inference Most psychological theories treat the features of objects as being fixed and immediately available to observers. However, novel objects have an infinite array of properties that could potentially be encoded as features, raising the question of how people learn which features to use in representing those objects. We focus on the effects of distributional information on feature learning, considering how a rational agent should use statistical information about the properties of objects in identifying features. Inspired by previous behavioral results on human feature learning, we present an ideal observer model based on nonparametric Bayesian statistics. This model balances the idea that objects have potentially infinitely many features with the goal of using a relatively small number of features to represent any finite set of objects. We then explore the predictions of this ideal observer model. In particular, we investigate whether people are sensitive to how parts co-vary over objects they observe. In a series of four behavioral experiments (three using visual stimuli, one using conceptual stimuli), we demonstrate that people infer different features to represent the same four objects depending on the distribution of parts over the objects they observe. Additionally in all four experiments, the features people infer have consequences for how they generalize properties to novel objects. We also show that simple models that use the raw sensory data as inputs and standard dimensionality reduction techniques (principal component analysis and independent component analysis) are insufficient to explain our results.

2010

URL Abstract	Gardner, J. S., Austerweil, J. L., & Palmer, S. E. (2010). Vertical position as a cue to pictorial depth: Height in the picture plane versus distance to the horizon. Attention, Perception, & Psychophysics, 72(2), 445-453. Tags: perception Two often cited but frequently confused pictorial cues to perceived depth are height in the picture plane (HPP) and distance to the horizon (DH). We report two psychophysical experiments that disentangled their influence on perception of relative depth in pictures of the interior of a schematic room. Experiment 1 showed that when HPP and DH varied independently with both a ceiling and a floor plane visible in the picture, DH alone determined judgments of relative depth; HPP was irrelevant. Experiment 2 studied relative depth perception in single-plane displays (floor only or ceiling only) in which the horizon either was not visible or was always at the midpoint of the target object. When the target object was viewed against either a floor or a ceiling plane, some observers used DH, but others (erroneously) used HPP. In general, when DH is defined and unambiguous, observers use it to determine the relative distance to objects, but when DH is undefined and/or ambiguous, at least some observers use HPP.
URL Abstract	Austerweil, J. L., & Griffiths, T. L. (2010). Learning hypothesis spaces and dimensions through concept learning. In S. Ohlsson, & R. Catrambone (Ed.), Proceedings of the 32nd annual conference of the cognitive science society (pp. 73-78). Austin, TX: Cognitive Science Society. Tags: categorization feature inference Generalizing a property from a set of objects to a new object is a fundamental problem faced by the human cognitive system, and a long-standing topic of investigation in psychology. Classic analyses suggest that the probability with which people generalize a property from one stimulus to another depends on the distance between those stimuli in psychological space. This raises the question of how people identify an appropriate metric for determining the distance between novel stimuli. In particular, how do people determine if two dimensions should be treated as separable, with distance measured along each dimension independently (as in an $L_1$ metric), or integral, supporting Euclidean distance (as in an $L_2$ metric)? We build on an existing Bayesian model of generalization to show that learning a metric can be formalized as a problem of learning a hypothesis space for generalization, and that both ideal and human learners can learn appropriate hypothesis spaces for a novel domain by learning concepts expressed in that domain.
URL Abstract	Austerweil, J. L., & Griffiths, T. L. (2010). Learning invariant features using the transformed indian buffet process. In R. Zemel, & J. Shawne-Taylor (Ed.), Advances in Neural Information Processing Systems (pp. 82-90). Cambridge, MA: MIT Press. Tags: feature inference bayesian nonparametrics Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features

2009

URL

Abstract

Austerweil, J. L., & Griffiths, T. L. (2009). The effect of distributional information on feature learning. In N. A. Taatgen, & H. van Rijn (Ed.), Proceedings of the 31st annual conference of the cognitive science society (pp. 2765-2770). Austin, TX: Cognitive Science Society.

Tags: feature inference rational analysis

A fundamental problem solved by the human mind is the formation of basic units to represent observed objects that support future decisions. We present an ideal observer model that infers features to represent the raw sensory data of a given set of objects. Based on our rational analysis of feature representation, we predict that the distribution of the parts that compose objects should affect the features people use to infer objects. We confirm this prediction in a behavioral experiment, suggesting that distributional information is one of the factors that determines how people identify the features of objects.

URL

Abstract

Austerweil, J. L., & Griffiths, T. L. (2009). Analyzing human feature learning as nonparametric bayesian inference. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Ed.), Advances in neural information processing systems (pp. 97-104). Red Hook, NY: Curran Associates, Inc.

Tags: feature inference bayesian nonparametrics

Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features.

2008

URL

Abstract

Austerweil, J. L., & Griffiths, T. L. (2008). A rational analysis of confirmation with deterministic hypotheses. In B. C. Love, K. McRae, & V. M. Sloutsky (Ed.), Proceedings of the 30th annual conference of the cognitive science society (pp. 1041-1046). Austin, TX: Cognitive Science Society.

Tags: rational analysis

Whether scientists test their hypotheses as they ought to has interested both cognitive psychologists and philosophers of science. Classic analyses of hypothesis testing assume that people should pick the test with the largest probability of falsifying their current hypothesis, while experiments have shown that people tend to select tests consistent with that hypothesis. Using two different normative standards, we prove that seeking evidence predicted by your current hypothesis is optimal when the hypotheses in question are deterministic and other reasonable assumptions hold. We test this account with two experiments using a sequential prediction task, in which people guess the next number in a sequence. Experiment 1 shows that people’s predictions can be captured by a simple Bayesian model. Experiment 2 manipulates people’s beliefs about the probabilities of different hypotheses, and shows that they confirm whichever hypothesis they are led to believe is most likely.

2007

URL

Abstract

Elsner, M., Austerweil, J. L., & Charniak, E. (2007). A unified local and global model for discourse coherence. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 436-443). New York City, NY, USA: Association for Computational Linguistics.

Tags: natural language processing

We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMM-based content model of (Barzilay and Lee, 2004). Unlike the mixture model of (Soricut and Marcu, 2006), we learn local and global features jointly, providing a better theoretical explanation of how they are useful. As the local component of our model we adapt (Barzilay and Lapata, 2005) by relaxing independence assumptions so that it is effective when estimated generatively. Our model performs the ordering task competitively with (Soricut and Marcu, 2006), and significantly better than either of the models it is based on.

2006

URL

Abstract

Charniak, E., Johnson, M., Elsner, M., Austerweil, J. L., Ellis, D., Haxton, I., Hill, C., Shrivaths, R., Moore, J., Pozar, M., & Vu, T. (2006). Multilevel coarse-to-fine PCFG parsing. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 168-175). New York City, NY, USA: Association for Computational Linguistics.

Tags: natural language processing

We present a PCFG parsing algorithm that uses a multilevel coarse-to-fine (MLCTF) scheme to improve the efficiency of search for the best parse. Our approach requires the user to specify a sequence of nested partitions or equivalence classes of the PCFG nonterminals. We define a sequence of PCFGs corresponding to each partition, where the nonterminals of each PCFG are clusters of nonterminals of the original source PCFG. We use the results of parsing at a coarser level (i.e., grammar defined in terms of a coarser partition) to prune the next finer level. We present experiments showing that with our algorithm the work load (as measured by the total number of constituents processed) is decreased by a factor of ten with no decrease in parsing accuracy compared to standard CKY parsing with the original PCFG. We suggest that the search space over mlctf algorithms is almost totally unexplored so that future work should be able to improve significantly on these results.

Show all papers.