×
Well done. You've clicked the tower. This would actually achieve something if you had logged in first. Use the key for that. The name takes you home. This is where all the applicables sit. And you can't apply any changes to my site unless you are logged in.

Our policy is best summarized as "we don't care about _you_, we care about _them_", no emails, so no forgetting your password. You have no rights. It's like you don't even exist. If you publish material, I reserve the right to remove it, or use it myself.

Don't impersonate. Don't name someone involuntarily. You can lose everything if you cross the line, and no, I won't cancel your automatic payments first, so you'll have to do it the hard way. See how serious this sounds? That's how serious you're meant to take these.

×
Register


Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only.
  • Your password can’t be too similar to your other personal information.
  • Your password must contain at least 8 characters.
  • Your password can’t be a commonly used password.
  • Your password can’t be entirely numeric.

Enter the same password as before, for verification.
Login

Grow A Dic
Define A Word
Make Space
Set Task
Mark Post
Apply Votestyle
Create Votes
(From: saved spaces)
Exclude Votes
Apply Dic
Exclude Dic

Click here to flash read.

arXiv:2212.09192v4 Announce Type: replace-cross
Abstract: The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one arm and receives new information. The learner often faces an exploration-exploitation dilemma: exploiting the current information by playing the arm with the highest estimated reward versus exploring all arms to gather more reward information. The design objective aims to maximize the expected cumulative reward over all rounds. However, such an objective does not account for a risk-reward tradeoff, which is often a fundamental precept in many areas of applications, most notably in finance and economics. In this paper, we build upon Sani et al. (2012) and extend the classical MAB problem to a mean-variance setting. Specifically, we relax the assumptions of independent arms and bounded rewards made in Sani et al. (2012) by considering sub-Gaussian arms. We introduce the Risk Aware Lower Confidence Bound (RALCB) algorithm to solve the problem, and study some of its properties. Finally, we perform a number of numerical simulations to demonstrate that, in both independent and dependent scenarios, our suggested approach performs better than the algorithm suggested by Sani et al. (2012).

Click here to read this post out
ID: 841017; Unique Viewers: 0
Unique Voters: 0
Total Votes: 0
Votes:
Latest Change: May 7, 2024, 7:34 a.m. Changes:
Dictionaries:
Words:
Spaces:
Views: 7
CC:
No creative common's license
Comments: