본문 바로가기

Reinforcement Learning

Model-based and model-free pain avoidance learning.

https://www.ncbi.nlm.nih.gov/pubmed/30370339



 2018 May 5;2:2398212818772964. doi: 10.1177/2398212818772964. eCollection 2018.

Model-based and model-free pain avoidance learning.

Author information

1
Department of Neural Computation for Decision-making, Advanced Telecommunications Research Institute International, Kyoto, Japan.
2
Department of Biology, Stanford University, Stanford, CA, USA.
3
Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
4
Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
5
Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, UK.
6
Center for Information and Neural Networks, National Institute for Information and Communications Technology, Osaka, Japan.

Abstract

Background: While there is good evidence that reward learning is underpinned by two distinct decision control systems - a cognitive 'model-based' and a habitbased 'model-free' system, a comparable distinction for punishment avoidance has been much less clear. Methods: We implemented a pain avoidance task that placed differential emphasis on putative model-based and model-free processing, mirroring a paradigm and modelling approach recently developed for reward-based decision-making. Subjects performed a two-step decision-making task with probabilistic pain outcomes of different quantities. The delivery of outcomes was sometimes contingent on a rule signalled at the beginning of each trial, emulating a form of outcome devaluation. Results: The behavioural data showed that subjects tended to use a mixed strategy - favouring the simpler model-free learning strategy when outcomes did not depend on the rule, and favouring a model-based when they did. Furthermore, the data were well described by a dynamic transition model between the two controllers. When compared with data from a reward-based task (albeit tested in the context of the scanner), we observed that avoidance involved a significantly greater tendency for subjects to switch between model-free and model-based systems in the face of changes in uncertainty. Conclusion: Our study suggests a dual-system model of pain avoidance, similar to but possibly more dynamically flexible than reward-based decision-making.

KEYWORDS:

Decision-making; pain avoidance; reinforcement learning; uncertainty

PMID:
 
30370339
 
PMCID:
 
PMC6187988
 
DOI:
 
10.1177/2398212818772964





기존의 강화학습은, Reward를 주면서 reinforced 되면서, 학습을 하게된다.

이 방법은, negative reward 라고 할 수 있는 pain을 avoidance 하면서 (positively reinforced 가 아닌)

mixed (dual) strategy 로 학습을 한다는 것.


기계 학습에서, 여러가지 방법들이, 더 응용되고 개발되는 추세.


사람도 그렇듯이, 아이스크림이 좋으면 더 먹고, 더먹으면 더 좋아하고.

식초를 먹으면 너무 시므로, 생식초는 안먹고 되고, 식초 냄새만 맡아도 멀리하고.


사람의 decision making 에 관여하는 요소는 워낙 많으니까.

그런 것들의 알고리즘을 점차 개발해 나가면,

사람과 유사한, 또는 더 나은, 의사결정 수단을 만들어 나갈 수 있을 듯...








2018 Model-based and model-free pain avoidance learning.pdf