A Hybrid Learning Strategy for Discovery of Policies of Action
Pontifical Catholic University of Paraná – PUCPR
Curitiba – PR, Brazil
Graduate Program in Computer Science - PPGIA
A Hybrid Learning Strategy for
Discovery of Policies of Action
R. Ribeiro, A. L. Koerich and F. Enembreck
XVIII Brazilian Artificial Intelligence Symposium (SBIA 2006),
Ribeirão Preto, SP, Brazil, October 2006
Motivation & Challenge;
– Adaptive Autonomous Agents;
– Reinforcement Learning;
– Q-Learning Algorithm;
– Policy Estimation Techniques based on Instance-Based Learning;
– Evaluation Methodology;
– Hybrid Learning Method;
Conclusion & Future Work.
Pontifical Catholic University of Paraná - PUCPR 2
Motivation & Challenge
Discovery and Evaluation of Policies of Action;
Generic Evaluation Methodology;
Hybrid Learning Method.
Pontifical Catholic University of Paraná - PUCPR 3
ADAPTIVE AUTONOMOUS AGENTS:
– Finding an action policies autonomously;
– Incremental learning based in reward/punishments;
– Learning through of trial/error interactions with an
– Convergence for an optimal policy visiting all states of the
Pontifical Catholic University of Paraná - PUCPR 4
Foundations of Reinforcement Learning:
– Environment, action policies and reward.
Sensing (s) Rewards/
Pontifical Catholic University of Paraná - PUCPR 5
Example of learning
EXAMPLE (Problem proposed):
(a) Set up of States b) Without Learning (c) Intermediate Policies
(d) 1000 steps (e) 1500 steps (f)Optimal Policy
Pontifical Catholic University of Paraná - PUCPR 6
– Different domains;
– Quality measures are often specific (kilometers,
money, force, energy, etc);
– Different ways of evaluation the same problem (n. of
steps, n. of changes of actions, processing time).
Pontifical Catholic University of Paraná - PUCPR 7
Generic Evaluation Methodology of Policies of Action;
Hybrid Learning Method;
Pontifical Catholic University of Paraná - PUCPR 8
Pontifical Catholic University of Paraná - PUCPR 9
1 Initiating Correct=0, Wrong=0, CostP=0, CostA*=0;
2 For each s ∈ S:
CostP = cost(s, s_goal, P);
CostA*= cost(s, s_goal, PA*);
- Related pdf books
- Clipping - 19-10-2012 - ICMC-USP - São Carlos | Instituto de ...
- Ergodic properties of dynamical systems beyond uniform ...
- Sumarização Automática de Textos Científicos Estudo de Caso
- A Base de Dados Lexical e a Interface Web do TeP 2.0 – Thesaurus ...
- Analise Funcional II
- III Workshop on MSc dissertation and Phd thesis in Artificial Inteligence -
- Métodos para resolver problemas de otimização restrita
- Soluci´on Num´erica de la Ecuaci´on de Calorporel M´etodode
- Edital ATAc/ICMC/USP n 028/2014
- Universidade de S˜ao Paulo - ICMC-USP - São Carlos ...
- HTRP II Learning thematic relations from semantically sound sentences
- Fluxograma Horas extras - ICMC-USP - São Carlos | Instituto ...
- SMA0187 – Prática de Ensino de Matemática II Sem2009
- Which classification algorithm works best with stylistic features of
- CENTRO UNIVERSITÁRIO MOURA LACERDA COORDENADORIA DE PESQUISA ...
- Visualization of music collections based on structural similarity
- 924 D CÁLCULO
- Popular epubs
- CANARIE AND CUCCIO ENABLE RESEARCH , DISCOVERY AND LEARNING
- Hybrid Systems Modeling in Learning Science and Technology
- Galisteo Creek Watershed Restoration Action Strategy
- Residential Environment Action Strategy
- Summary of WDR Forum, “The 70:20:10 Learning Strategy Debate ...
- Complex Behavioral Strategy and Reversal Learning in the Water ...
- Community Services - Student Learning and Assessment Strategy
- Learning from the Korean Green IT Strategy
- Learning and Teaching Strategy 2008/9
- e-business Models: Integrating Learning from Strategy ...
- Assistente de currículo
- Study of Perceptual Similarity between Different Lexicons