https://www.ncbi.nlm.nih.gov/pubmed/30370339
Model-based and model-free pain avoidance learning.
Author information
- 1
- Department of Neural Computation for Decision-making, Advanced Telecommunications Research Institute International, Kyoto, Japan.
- 2
- Department of Biology, Stanford University, Stanford, CA, USA.
- 3
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
- 4
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
- 5
- Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, UK.
- 6
- Center for Information and Neural Networks, National Institute for Information and Communications Technology, Osaka, Japan.
Abstract
Background: While there is good evidence that reward learning is underpinned by two distinct decision control systems - a cognitive 'model-based' and a habitbased 'model-free' system, a comparable distinction for punishment avoidance has been much less clear. Methods: We implemented a pain avoidance task that placed differential emphasis on putative model-based and model-free processing, mirroring a paradigm and modelling approach recently developed for reward-based decision-making. Subjects performed a two-step decision-making task with probabilistic pain outcomes of different quantities. The delivery of outcomes was sometimes contingent on a rule signalled at the beginning of each trial, emulating a form of outcome devaluation. Results: The behavioural data showed that subjects tended to use a mixed strategy - favouring the simpler model-free learning strategy when outcomes did not depend on the rule, and favouring a model-based when they did. Furthermore, the data were well described by a dynamic transition model between the two controllers. When compared with data from a reward-based task (albeit tested in the context of the scanner), we observed that avoidance involved a significantly greater tendency for subjects to switch between model-free and model-based systems in the face of changes in uncertainty. Conclusion: Our study suggests a dual-system model of pain avoidance, similar to but possibly more dynamically flexible than reward-based decision-making.
KEYWORDS:
Decision-making; pain avoidance; reinforcement learning; uncertainty
- PMID:
- 30370339
- PMCID:
- PMC6187988
- DOI:
- 10.1177/2398212818772964
기존의 강화학습은, Reward를 주면서 reinforced 되면서, 학습을 하게된다.
이 방법은, negative reward 라고 할 수 있는 pain을 avoidance 하면서 (positively reinforced 가 아닌)
mixed (dual) strategy 로 학습을 한다는 것.
기계 학습에서, 여러가지 방법들이, 더 응용되고 개발되는 추세.
사람도 그렇듯이, 아이스크림이 좋으면 더 먹고, 더먹으면 더 좋아하고.
식초를 먹으면 너무 시므로, 생식초는 안먹고 되고, 식초 냄새만 맡아도 멀리하고.
사람의 decision making 에 관여하는 요소는 워낙 많으니까.
그런 것들의 알고리즘을 점차 개발해 나가면,
사람과 유사한, 또는 더 나은, 의사결정 수단을 만들어 나갈 수 있을 듯...
2018 Model-based and model-free pain avoidance learning.pdf