Simulate data with RL models, RLDDMs, and RL+race models

These functions can be used to simulate data of a single participant or of a group of participants, given a set of parameter values.

These functions can be thus used for parameter recovery: A model can be fit on the simulated data in order to compare the generating parameters with their estimated posterior distributions.

Note

At the moment, only non-hierarchical RLRDM data can be simulated.

Simulate RL stimuli

rlssm.random.generate_task_design_fontanesi(n_trials_block, n_blocks, n_participants, trial_types, mean_options, sd_options)

Generates the RL stimuli as in the 2019 Fontanesi et al.’s paper.

Note

In the original paper we corrected for repetition and order presentation of values too. This is not implemented here.

Parameters
  • n_trials_block (int) – Number of trials per learning session.

  • n_blocks (int) – Number of learning session per participant.

  • n_participants (int) – Number of participants.

  • trial_types (list of strings) – List containing possible pairs of options. E.g., in the original experiment: [‘1-2’, ‘1-3’, ‘2-4’, ‘3-4’]. It is important that they are separated by a ‘-‘, and that they are numbered from 1 to n_options (4 in the example). Also, the “incorrect” option of the couple should go first in each pair.

  • mean_options (list or array of floats) – Mean reward for each option. The length should correspond to n_options.

  • sd_options (list or array of floats) – SD reward for each option. The length should correspond to n_options.

Returns

task_designpandas.DataFrame, with n_trials_block*n_blocks rows. Columns contain: “f_cor”, “f_inc”, “trial_type”, “cor_option”, “inc_option”, “trial_block”, “block_label”, “participant”.

Return type

DataFrame

Simulate only-choices RL data

rlssm.random.simulate_rl_2A(task_design, gen_alpha, gen_sensitivity, initial_value_learning=0)

Simulates behavior (accuracy) according to a RL model,

where the learning component is the Q learning (delta learning rule) and the choice rule is the softmax.

This function is to simulate data for, for example, parameter recovery. Simulates data for one participant.

Note

The number of options can be actaully higher than 2, but only 2 options (one correct, one incorrect) are presented in each trial. It is important that “trial_block” is set to 1 at the beginning of each learning session (when the Q values at resetted) and that the “block_label” is set to 1 at the beginning of the experiment for each participants. There is no special requirement for the participant number.

Parameters
  • task_design (DataFrame) – pandas.DataFrame, with n_trials_block*n_blocks rows. Columns contain: “f_cor”, “f_inc”, “trial_type”, “cor_option”, “inc_option”, “trial_block”, “block_label”, “participant”.

  • gen_alpha (float or list of floats) – The generating learning rate. It should be a value between 0 (no updating) and 1 (full updating). If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_sensitivity (float) – The generating sensitivity parameter for the soft_max choice rule. It should be a value higher than 0 (the higher, the more sensitivity to value differences).

  • initial_value_learning (float) – The initial value for Q learning.

Returns

datapandas.DataFrame, that is the task_design, plus: ‘Q_cor’, ‘Q_inc’, ‘alpha’, ‘sensitivity’, ‘p_cor’, and ‘accuracy’.

Return type

DataFrame

rlssm.random.simulate_hier_rl_2A(task_design, gen_mu_alpha, gen_sd_alpha, gen_mu_sensitivity, gen_sd_sensitivity, initial_value_learning=0)

Simulates behavior (accuracy) according to a RL model, where the learning component is the Q learning (delta learning rule) and the choice rule is the softmax.

Simulates hierarchical data for a group of participants. The individual parameters have the following distributions:

  • alpha ~ Phi(normal(gen_mu_alpha, gen_sd_alpha))

  • sensitivity ~ log(1 + exp(normal(gen_mu_sensitivity, gen_sd_sensitivity)))

When 2 learning rates are estimated:

  • alpha_pos ~ Phi(normal(gen_mu_alpha[0], gen_sd_alpha[1]))

  • alpha_neg ~ Phi(normal(gen_mu_alpha[1], gen_sd_alpha[1]))

Note

The number of options can be actaully higher than 2, but only 2 options (one correct, one incorrect) are presented in each trial. It is important that “trial_block” is set to 1 at the beginning of each learning session (when the Q values at resetted) and that the “block_label” is set to 1 at the beginning of the experiment for each participants. There is no special requirement for the participant number.

Parameters
  • task_design (DataFrame) – pandas.DataFrame, with n_trials_block*n_blocks rows. Columns contain: “f_cor”, “f_inc”, “trial_type”, “cor_option”, “inc_option”, “trial_block”, “block_label”, “participant”.

  • gen_mu_alpha (float or list of floats) – The generating group mean of the learning rate. If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_sd_alpha (float or list of floats) – The generating group SD of the learning rate. If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_mu_sensitivity (float) – The generating group mean of the sensitivity parameter for the soft_max choice rule.

  • gen_sd_sensitivity (float) – The generating group SD of the sensitivity parameter for the soft_max choice rule.

  • initial_value_learning (float) – The initial value for Q learning.

Returns

datapandas.DataFrame, that is the task_design, plus: ‘Q_cor’, ‘Q_inc’, ‘alpha’, ‘sensitivity’, ‘p_cor’, and ‘accuracy’.

Return type

DataFrame

Simulate RLDDM data (choices and RTs)

rlssm.random.simulate_rlddm_2A(task_design, gen_alpha, gen_drift_scaling, gen_threshold, gen_ndt, initial_value_learning=0, **kwargs)

Simulates behavior (rt and accuracy) according to a RLDDM model,

where the learning component is the Q learning (delta learning rule) and the choice rule is the DDM.

Simulates data for one participant.

In this parametrization, it is assumed that 0 is the lower threshold, and the diffusion process starts halfway through the threshold value.

Note

The number of options can be actaully higher than 2, but only 2 options (one correct, one incorrect) are presented in each trial. It is important that “trial_block” is set to 1 at the beginning of each learning session (when the Q values at resetted) and that the “block_label” is set to 1 at the beginning of the experiment for each participants. There is no special requirement for the participant number.

Parameters
  • task_design (DataFrame) – pandas.DataFrame, with n_trials_block*n_blocks rows. Columns contain: “f_cor”, “f_inc”, “trial_type”, “cor_option”, “inc_option”, “trial_block”, “block_label”, “participant”.

  • gen_alpha (float or list of floats) – The generating learning rate. It should be a value between 0 (no updating) and 1 (full updating). If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_drift_scaling (float) – Drift-rate scaling of the RLDDM.

  • gen_threshold (float) – Threshold of the diffusion decision model. Should be positive.

  • gen_ndt (float) – Non decision time of the diffusion decision model, in seconds. Should be positive.

  • initial_value_learning (float) – The initial value for Q learning.

  • **kwargs – Additional arguments to rlssm.random.random_ddm().

Returns

datapandas.DataFrame, that is the task_design, plus: ‘Q_cor’, ‘Q_inc’, ‘drift’, ‘alpha’, ‘drift_scaling’, ‘threshold’, ‘ndt’, ‘rt’, and ‘accuracy’.

Return type

DataFrame

rlssm.random.simulate_hier_rlddm_2A(task_design, gen_mu_alpha, gen_sd_alpha, gen_mu_drift_scaling, gen_sd_drift_scaling, gen_mu_threshold, gen_sd_threshold, gen_mu_ndt, gen_sd_ndt, initial_value_learning=0, **kwargs)

Simulates behavior (rt and accuracy) according to a RLDDM model,

where the learning component is the Q learning (delta learning rule) and the choice rule is the DDM.

Simulates hierarchical data for a group of participants.

In this parametrization, it is assumed that 0 is the lower threshold, and the diffusion process starts halfway through the threshold value.

The individual parameters have the following distributions:

  • alpha ~ Phi(normal(gen_mu_alpha, gen_sd_alpha))

  • drift_scaling ~ log(1 + exp(normal(gen_mu_drift, gen_sd_drift)))

  • threshold ~ log(1 + exp(normal(gen_mu_threshold, gen_sd_threshold)))

  • ndt ~ log(1 + exp(normal(gen_mu_ndt, gen_sd_ndt)))

When 2 learning rates are estimated:

  • alpha_pos ~ Phi(normal(gen_mu_alpha[0], gen_sd_alpha[1]))

  • alpha_neg ~ Phi(normal(gen_mu_alpha[1], gen_sd_alpha[1]))

Note

The number of options can be actaully higher than 2, but only 2 options (one correct, one incorrect) are presented in each trial. It is important that “trial_block” is set to 1 at the beginning of each learning session (when the Q values at resetted) and that the “block_label” is set to 1 at the beginning of the experiment for each participants. There is no special requirement for the participant number.

Parameters
  • task_design (DataFrame) – pandas.DataFrame, with n_trials_block*n_blocks rows. Columns contain: “f_cor”, “f_inc”, “trial_type”, “cor_option”, “inc_option”, “trial_block”, “block_label”, “participant”.

  • gen_mu_alpha (float or list of floats) – The generating group mean of the learning rate. If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_sd_alpha (float or list of floats) – The generating group SD of the learning rate. If a list of 2 values is provided then 2 separate learning rates for positive and negative prediction error are used.

  • gen_mu_drift_scaling (float) – Group-mean of the drift-rate scaling of the RLDDM.

  • gen_sd_drift_scaling (float) – Group-standard deviation of the drift-rate scaling of the RLDDM.

  • gen_mu_threshold (float) – Group-mean of the threshold of the RLDDM.

  • gen_sd_threshold (float) – Group-standard deviation of the threshold of the RLDDM.

  • gen_mu_ndt (float) – Group-mean of the non decision time of the RLDDM.

  • gen_sd_ndt (float) – Group-standard deviation of the non decision time of the RLDDM.

  • initial_value_learning (float) – The initial value for Q learning.

  • **kwargs – Additional arguments to rlssm.random.random_ddm().

Returns

datapandas.DataFrame, that is the task_design, plus: ‘Q_cor’, ‘Q_inc’, ‘drift’, ‘alpha’, ‘drift_scaling’, ‘threshold’, ‘ndt’, ‘rt’, and ‘accuracy’.

Return type

DataFrame

Simulate RLRDM data (choices and RTs)

rlssm.random.simulate_rlrdm_2A(task_design, gen_alpha, gen_drift_scaling, gen_threshold, gen_ndt, initial_value_learning=0, **kwargs)