johnhidley
johnhidley

Reputation: 11

I am trying to understand some machine learning terminology. What is the difference between learning parameters, hyperparameters, and structure?

I am trying to understand some machine learning terminology: parameters, hyperparameters, and structure -- all used in a Bayes-net context. 1) In particular, how is structure different than parameters or hyperparameters. 2) What does parameterize mean? Thanks.

Upvotes: 1

Views: 442

Answers (3)

Sayali Sonawane
Sayali Sonawane

Reputation: 12599

STRUCTURE

The structure, or topology, of the network should capture qualitative relationships between variables.In particular, two nodes should be connected directly if one affects or causes the other, with the arc indicating the direction of the effect.

Structure

Lets consider above example, we might ask what factors affect a patient’s chance of having cancer? If the answer is “Pollution and smoking,” then we should add arcs from Pollution and Smoker to Cancer. Similarly, having cancer will affect the patient’s breathing and the chances of having a positive X-ray result. So we add arcs from Cancer to Dyspnoea and XRay. The resultant structure is shown in above figure.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a node is a parent of a child, if there is an arc from the former to the latter. Extending the metaphor, if there is a directed chain of nodes, one node is an ancestor of another if it appears earlier in the chain, whereas a node is a descendant of another node if it comes later in the chain. In our example, the Cancer node has two parents, Pollution and Smoker, while Smoker is an ancestor of both X-ray and Dyspnoea. Similarly, Xray is a child of Cancer and descendant of Smoker and Pollution. The set of parent nodes of a node X is given by Parents(X).

By convention, for easier visual examination of BN structure, networks are usually laid out so that the arcs generally point from top to bottom. This means that the BN “tree” is usually depicted upside down, with roots at the top and leaves at the bottom!

Upvotes: 1

Chobeat
Chobeat

Reputation: 3535

To add to the answer of lejlot, I would like to spend some words on the term "parameter".

For many algorithms, a synonym for paratemer is weight. This is true for most linear models, where a weight is a coefficient of the line describing the model. In this case parameters is used only for the parameters of the learning algorithm and this may be a bit confusing when moving to other kinds of ML algorithms. Also, contrary to what lejlot said, these parameters may not be that abstract: often they have a clear meaning in terms of effect on the learning process. For example, with SVMs, parameters may weight the importance of misclassifications.

Upvotes: -1

lejlot
lejlot

Reputation: 66805

In general (however exact definition may vary across authors/papers/models):

  • structure - describes how elements of your graph/model are connected/organized, thus it is usually a generic description of how information flows. Often it is expressed as a directed graph. On the level of structure you often omit details like models details. Example: logistic regression model consists of an input node and output node, where output node produces P(y|x).
  • parametrization - since a common language in Bayesian (and whole ML) approach is a language of probability many models are expressed in terms of probabilities / other quantities which are a nice mathematical objects, but cannot be anyhow implemented/optimized/used. They are just abstract concepts. Parametrization is a process of taking such abstract object and narrowing down the space of possible values to a set of functions which are parametrized (usually by real-valued vector/matrices/tensors). For example, our P(y|x) of logistic regression can be parametrized as a linear function of x through P(y|x) = 1/(1 + exp(-<x, w>)) where w is a real-valued vector of parameters.
  • parameters - as seen above - are elements of your model, introduced during parametrization, which are usually learnable. Meaning, that you can provide reasonable mathematical ways of finding best values of them. For example in the above example w is a parameter, learnable during probability maximization, using for example steepest descent method (SGD).
  • hyperparameters - these are values, very similar to parameters, but for which you cannot really provide nice learning schemes. It is usually due to their non-continuos nature, often alternating the structure. For example, in a neural net, a hyperparameter is number of hidden units. You cannot differentiate through this element, so SGD cannot really learn this value. You have to set it apriori, or use some meta-learning technique (which is often extremely inefficient). In general, distinction between parameter and hyperparameter is very fuzzy and depending on the context - they change assigment. For example if you apply genetic algorithm to learn hyperparameters of the neural net, these hyperparameters of the neural net become parameters of the model being learned by GA.

Upvotes: 1

Related Questions