Saturday, January 31, 2015

Puppet 4.0 Data in Modules and Environments

In Puppet 4.0.0 there is a new technology-agnostic mechanism for data lookup that makes it possible to provide default values for class parameters in modules and in environments. The mechanism looks first in the "global" data binding mechanism across all environments (i.e. the existing mechanism for data binding, which in practice means hiera, since this is the only available implementation). It then looks for data in the environment, and finally in the module.

The big thing here is that a user of a module does not have to know which implementation the module author has chosen - the module is simply installed (with its dependencies). The user is free to override values using an implementation of their choice (in the environment using the new mechanism, or with the existing data binding / hiera support).

It is expected that there will be implementations for hiera as well available in a module.

In this part 1 about the new data binding feature I will show how it can be used in environments and modules. In the next part I will show how to make new data binding implementations.

How does it work?

Out of the box, the new feature:

  • provides module authors with a way to select which data binding implementation to use in their module without affecting how other modules get their data.

  • provides users configuring an environment to select which data binding implementation to use in an environment (or all environments) - different environments can use different implementations, and the environment does not have to use the same implementation as the modules.

  • contains a data binding implementation named 'function' which calls a puppet function that returns a hash of data. The module author can select this mechanism and simply implement the function. A user can also configure an environment to use a function to provide the data - the function is then added to the environment.

  • provides module author with a way to package and share a data binding implementation in a module. It can be delivered in the same module as regular content, or in a separate module just containing the data binding implementation.

Using a function to deliver data in an environment

This is the easiest, so I am starting with that. Two things are needed:

  • Configuring the environment to state that a function delivers data.
  • Writing the function

configuring the environment

The binding provider to use for an environment can be selected via the environment specific setting environment_data_provider. The value is the name of the data provider implementation to use. In our example this is 'function'. If not set in an environment specific environment.conf, the environment inherits the global setting - which is handy if all your environments work the same way.

writing the function

The function must be written using the 4x function API and placed in a file called lib/puppet/functions/environment/data.rb under the root directory of the environment.

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data()
    # return a hash with key to value mappings 
    { 'abc::param_a' => 'default value for param a in class abc',
      'abc::param_b' => 'default value for param b in class abc',
    }
  end
end

Later in the 4x series of Puppet, it will be possible to also write such functions in the puppet language which makes authoring more accessible.

Note that the name of the function is always environment::data irrespective of what the actual name of the environment is. This because, it would not be good if the name of the function had to change as you test a new environment named 'dev' and later merged it into 'production'.

Using a function to deliver data in a module

The steps to deliver data with a function for a module is different because there are no individual settings for a module. Here are the steps:

  • Creating a binding using the Puppet Binder to declare that the module should use the 'function' data provider for this module.
  • Writing the Function

Note that in the future, the data provider name may be made part of the module's metadata. This is however not the case in the Puppet 4.0.0 release.

writing the binding

The binding is very simple as it is all boilerplate except for the name of the module and the name of the data provider implementation - 'mymodule' and 'function' in the example below. The name of the file is lib/puppet/bindings/mymodule/default.rb where the mymodule part needs to reflect the name of the module it is placed in. (The file is always called 'default.rb' since it contains the default puppet bindings for this module).

# <moduleroot>/lib/puppet/bindings/mymodule/default.rb
#
Puppet::Bindings.newbindings('mymodule::default') do
  bind {
    name         'mymodule'            # name of the module this is placed in
    to           'function'            # name of the data provider
    in_multibind 'puppet::module_data' # boiler-plate
  }
end

writing the function

This is exactly the same as for the environment, but the function is named mymodule::data where mymodule is the name of the module this function provides data for. The file name is lib/puppet/functions/mymodule/data.rb

# <moduleroot>/lib/puppet/functions/mymodule/data.rb
#
Puppet::Functions.create_function(:'mymodule::data') do
  def data()
    # Return a hash with parameter name to value mapping
    { 'mymodule::abc::param_a' => 'default value for param a in class mymodule::abc',
      'mymodule::abc::param_b' => 'default value for param b in class mymodule::abc',
    }
  end
end

Overriding a parameter in the environment

As you may have figured out already, it is easy to override the module's data in the environment. As an example we may want to provide a different value for mymodule::abc::param_b at the environment level. This is how that would look:

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data() 
    { # ... other keys and values
      'mymodule::abc::param_b' => 'env specific value for param b in class mymodule::abc',
    }
  end
end

Getting the data

To get the data, there is absolutely nothing you need to do in your manifests. Just as before, if a class parameter does not have a value, it will be looked up as explained in this blog post. Finally, if there was no value to lookup the default parameter value given in the manifest is used.

Using the examples above - if you have this in your init.pp for the mymodule module:

class mymodule::abc($param_a, $param_b) {
  notice $param_a, $param_b
}

the two parameters $param_a and $param_b will be given their values from the hashes returned by the data functions, looking up mymodle::abc::param_a, and mymodule::abc::param_b.

Note that there is no need to use the "params pattern" now in common use in modules for Puppet 3x!

More about Functions

Since the new 'function' data provider is based on the general concept of calling functions and you can call other functions from them, you have a very powerful mechanism to help you organize data and to do advanced composition.

The data function is called once during a compilation for the purpose of producing a Hash with qualified name strings to data values. The function body can call other functions, use expressions, transformations, composition etc. When the data binding kicks in, it will call the function on the first request to get a parameter in the compilation, it will then cache the returned hash and reuse it for lookup of additional parameters (this in contrast to calling the function for each and every parameter which would be much slower).

Note that the data function can be called like any other function!. This means that a module or environment can use another module's data function, transform it etc. before using its data.

Naturally, since we are dealing with functions it is easy to divide the composition of data into multiple functions, and then hierarchically compose them. Say that we want to divide the data up into two parts, one for osfamily, and one for common and we then want to combine them. We can now do a simple function composition and merge the result.

In the examples, the functions are written using the puppet language (even though they are not available in the 4.0.0 release). At the moment, it is left as an exercise to translate them into Ruby. What I want to show here is the power of combining data with functions without cluttering the examples with what you need to do in Ruby to get variables in scope, call other functions etc.

Data Composition with Puppet functions

When we add support for functions in the Puppet Language data composition can look like this:

function mymodule::data {
  mymodule::common() + mymodule::osfamily()
}

function mymodule::osfamily() {
  case $osfamily {
    'Debian' : {
       { mymodule::abc::param_a => 'the debian value for a' }
    }
    'Darwin': {
      { mymodule::abc::param_a => 'the osx value for a' }
    default: {
      { }  # empty hash
  }
}

function mymodule::common() {
  { mymodule::abc::param_a => 'the default for param a',
    mymodule::abc::param_b => 'the default for param b',
  }
}

Naturally, the functions called from the data function can take parameters. The data() function itself however does not take any parameters.

Example - Module with multiple use cases

A module author wants to provide a set of default values for a base use case of the module, but also wants to offer defaults for other use cases. Clearly, there can only be one set of defaults applied at any given time, and the data() function in a module is for that module only, so these defaults must be provided at a higher level i.e. in the environment (where it is known how the module is getting used). If the environment is also using the function data provider, it is very simple to achieve this:

function environment::data() {
  # merge usecase_x from module with the overrides
  mymodule::usecase_x() + {
    mymodule::abc::param_b => 'default from environment for param_b'
  }
}

This illustrates that mymodule has a special data function named mymodule::usecase_x() that provides an alternate set of default values for classes inside the mymodule, these are then overridden with a hash of specific overrides wanted in this environment.

Example - Hierarchical keys

If you find it tedious to retype mymodule::classname::foo, mymodule::classname::bar, etc. etc. you can instead construct the keys programmatically. Since the "data functions" are general functions, variables and interpolation can be used - e.g:

function mymodule::data() {
  $m = 'mymodule::abc'
  { "${m}::param_a" => 'the value', 
    ...
  }
}

Or why not call a function that reorganizes a hierarchical hash; say that we have param_a in classes a::b::x, a::b::y, and a::b::z, we could then do something like this:

function mymodule::data() {
  $hierarchical = { 
    a => {
      b => {
        x => { param_a => 'default for a::b::x::param_a' },
        y => { param_a => 'default for a::b::y::param_a' },
        z => { param_a => 'default for a::b::z::param_a' },
  }}}
  # Calling a function that expands the hash (left as an exercise)
  expand_hierarchical_keys($hierarchical)
}

Trying out this new featue

When this is written, the new data binding feature is available in the nightlies for Puppet 4.0.0, or you can run it from source using Puppet's master branch. (The new feature will not be available for 3x with future parser). If you are reading this after Puppet 4.0.0 has been released, just get the release.

Summary

The new data provider mechanism is a technology agnostic way of defining default data for modules and environments without dictating that a particular technology is used by the users of a module.

The new mechanism comes with a built in implementation based on functions that provides a simple yet powerful way of delivering, using and composing data. Functions in Ruby provide a simple way to extend the functionality without having to write a complete data provider.

The function mechanism, while relatively easy to write in Ruby for delivering data since they consist mostly of boilerplate code will become much more powerful and accessible when functions can be written in the Puppet Language.

In the next post about the new data binding feature I will show how to write a new implementation of a data provider.

5 comments:

  1. Part II (how to write a data provider) is now also posted: http://puppet-on-the-edge.blogspot.be/2015/02/puppet-40-data-in-modules-part-ii.html

    ReplyDelete
  2. Hi, Thx Henrik for this very interesting article. I think I will read it several times but I have questions if possible.

    1. With this new feature with the `mymodule::data` function in `./mymodule/lib/puppet/functions/mymodule/data.rb`, is it correct to say that :
    a. Now, in a module the "params" pattern is totally useless?
    b. Now, even the default values in the declaration of a class in a module are useless? (for instance in `class mymodule ( $param1 = 'default', ...) { ... }` the default value of param1 is useless and should be put in the data.rb function)

    2. In the `::mymodule::data()` function (which is in fact a "simple" custom function with the puppet4 API), is it possible to get a fact or a variable like in `lookupvar('osfamily')` or `lookupvar('::mymodule::foo')` with the puppet3 API for the custom function?

    3. Is it a good idea, a good practice to put hiera lookups in the `::mymodule::data()` function of a module.

    Thx in advance. ;)
    François Lafont (aka flaf on IRC)

    ReplyDelete
  3. 1
    a). you can use "data in modules" instead of params .pp, yes - which one is more valuable is up to preferences
    b) while you do get all the defaults from your "data in modules", default values may serve as documentation/example that is easier to read than for a user to also figure out what the default values are (in some function). But if you write nice documentation that is not needed.

    2. Yes, you can lookup variables as well as calling other functions. You need to call closure_scope() to get the scope in which the function was defined (this does not give you access to the calling scope (which is a bad design pattern to use in general). You can get calling_scope - it is documented in the language specification: read here (if you really, really must have access to calling scope): https://github.com/puppetlabs/puppet-specifications/blob/master/language/func-api.md#calling-scope-support

    3. There is a new function named lookup that combines the behavior of all hiera functions, it is also aware of data in modules. It is a bad idea to call hiera from within a data function because your default data is then no longer default and your module has a dependency on hiera. Instead, you let users simply override values either in their environment, or in their global hiera. When you lookup a key "class_a::param1", if it is defined in hiera, it wins over the default data provided in the module. You can use the "lookup" function though - it always looks up using both hiera and data in modules. Say your module_a depends on module_b, and you want some default values in module_a to use data configured for module_b - then you can use the lookup function in your module_a::data() function. It will get the correct (possibly overridden value) in the user's configuration. (If you use hiera to do the same, it will not ever lookup inside the "data in modules/environment" since it is a singleton global implementation across all environments and modules. Alternatively, if you module_a is designed to work with module_b, and both are using data functions you can call module_b's data function directly (as a function), and pick up values that you use a defaults in your module_a. This way you get the defaults from module_b (not overridden by data in the environment, nor in hiera). This relies on module_b::data() function being considered part of module_b's API.

    ReplyDelete
  4. Hi,

    Many thx Henrik for your answer. In fact, because the article has ~6 months, I thought it was too late to have an answer. So I had asked my questions here:

    https://groups.google.com/forum/#!topic/puppet-users/v3BtwUqlg40

    Sorry about this duplication Henrik. I have seen your answer in the thread above and I will post in this thread instead to post here (after closely reading your answer of course). Thx again.

    PS: as a user of "Puppet+Hiera", I don't like the duplication of data of course. ;))

    ReplyDelete
  5. Update - this blog post describes what we now refer to as (experimental) Hiera 4 - which was replaced by (officially released) Hiera 5, now included in Puppet. There is excellent documentation for Hiera 5 on the official Puppet documentation site. See https://puppet.com/docs/puppet/latest/hiera_intro.html

    ReplyDelete