Sunday 31 August 2014

Semi-supervised learning : Major varieties of learning problem

Semi-supervised learning : Major varieties of learning problem
Machine learning focuses on five main types of learning problems, with the first four falling under the category of function estimation. These problems can be grouped based on two dimensions: whether the learning task is supervised or unsupervised and whether the variable to be predicted is nominal or real-valued.

The first type of problem is classification, which involves supervised learning of a function f(x) that predicts a nominal value. The function learned is called a classifier, and it determines the class to which an instance x belongs based on its input. For example, the task might involve classifying a word in a sentence based on its part of speech. The learner is given labeled data, which includes instances along with their correct class labels. Using this data, the classifier learns to make predictions for new instances.

The concept of clustering is the unsupervised equivalent to classification. In clustering, the goal is also to assign instances to classes, but the algorithm only has access to the instances themselves, not the correct answers for any of them. The primary difference between classification and clustering is the type of data that is provided to the learner as input, specifically whether it is labeled or not. Two other important function estimation tasks include regression, where the learner estimates a function that takes on real values instead of finite values, and unsupervised learning of a real-valued function, which can be seen as density estimation. In this case, the learner is given an unlabeled set of training data and is tasked with learning a function that assigns a real value to every point in the space. Finally, reinforcement learning is another type of learning where the learner receives a stream of data from sensors and is expected to take actions based on this data. There is also a reward signal that the learner tries to maximize over time. The key differences between reinforcement learning and the other four function estimation settings are the sequential nature of the inputs and the indirect nature of the supervision provided by the reward signal.
 
Semisupervised learning is a form of machine learning that combines elements of both supervised and unsupervised learning. The distinction between these two approaches lies in whether or not the training data is labeled, with supervised learning relying on labeled data to classify and predict outcomes, while unsupervised learning seeks to discover patterns and structure within unlabeled data. In contrast, semisupervised learning involves providing some labeled data to the learner, while leaving the rest unlabeled. This mixed setting is the canonical case for semisupervised learning, and many methods have been developed to take advantage of it.

However, labeled and unlabeled data are not the only ways of providing partial information to the learner about the labels for training data. For instance, a few reliable rules for labeling instances or constraints limiting the candidate labels for specific instances could also be used. These alternative methods of partial labeling are also relevant to semisupervised learning and are often used in practice. While reinforcement learning could also be seen as a form of semisupervised learning because it relies on indirect information about labels, the connection between reinforcement learning and other semisupervised approaches is not well understood and is beyond the scope of this discussion.

No comments:

Post a Comment