Tuesday, January 31, 2017

SSSD: {DBus,Socket}-activated responders!

Since its 1.15.0 release, SSSD takes advantage of systemd machinery and introduces a new way to deal with the responders.

Previously, in order to have a responder initialized, the admin would have to add the specific responder to the "services" line in sssd.conf file, which does make sense for the responders that are often used but not for those rarely used (as the infopipe and PAC responders for instance).

This old way is still preserved (at least for now) and this new release is fully backwards-compatible with the old config file.

For this new release, however, adding responders to the "services" isn't needed anymore as the admin can easily enable any of the responders' sockets and those will be {dbus,socket}-activated on demand and will be up while are still being used. In case the responder becomes idle, it will automatically shut itself down after a configurable amount of time.

The sockets we've created are: sssd-autofs.socket, sssd-nss.socket, sssd-pac.socket, sssd-pam.socket (and sssd-pam-priv.socket, but you don't have to worry about this one), sssd-ssh.socket and sssd-sudo.socket. As an example, considering the admins want to enable the sockets for both NSS and PAM responders, they should do: `systemctl enable sssd-pam.socket sssd-nss.socket` and voilĂ !

In some cases the admins may also want to set the "responder_idle_timeout" option added for each of the responders in order to tweak for how long the responder will be running in case itbecomes idle. Setting this option to 0 (zero) disables the responder_idle_timeout. For more details, please, check sssd.conf man page.

For this release we've taken a more conservative path and are leaving up to the admins to enable the services they want to have enabled in case they would like to try to using {dbus,socket}-activated responders

It's also important to note that while the SELinux policies are not updated in your distro you may need to have SELinux in permissive mode in order to test/use the {dbus,socket}-activated responders. A bug for this is already filed for Fedora and hopefully will be fixed before the new package is included in the distro.

And the changes in the code were (a high-level explanation) ...

Before this work the monitor was the piece of code responsible for handling the responders listed in the services' line of sssd.conf file. And by handling I mean:

  • Gets the list of services to be started (and, consequently, the total number of services);
  • For each service:
    • Gets the service configuration;
    • Starts the service;
    • Adds the service to the services' list;
    • Once the service is up, a dbus message is sent to the monitor, which ...
      • Sets up the sbus* connection to communicate with the service;
      • Marks the service as started;

Now, the monitor does (considering an empty services' line):

  • Once the service is up, a dbus message is sent to the monitor;
    • The number of services is increased;
    • Gets the service configuration;
    • Adds the service to the services' list
    • Sets up the sbus connection to communicate with the service;
    • Sets up a destructor to the sbus connection in order to properly shutdown the service when this connection is closed;
    • Marks the service as started;

By looking at those two different processes done by the monitor, some of you may have realized an extra step needed when the service has been {dbus,socket}-activated that was needed at all before. Yep, "Sets up a destructor to the sbus connection in order to properly shutdown the service when this connection is closed" is a completely new thing as, previously, the services were just shut down when SSSD was shut down and now the services are shutdown when they become idle.

So, what's basically done now is:
 - Once there's no communication to the service, it's (sbus) connection with the monitor is closed;
 - Closing the (sbus) connection triggers the following actions:
    - The number of services is decreased;
    - The connection destructor is unset (otherwise it would be called again on the service has been freed);
    - Service is shut down:

*sbus: SSSD uses dbus protocol over a private socket to handle its internal communication, so the services do not talk over system bus.

And how do the unit files look like?

SSSD has 7 services: autofs, ifp, nss, pac, pam, ssh and sudo. From those 7 services 4 of them have pretty much these unit files:

AutoFS, PAC, SSH and Sudo unit files:


sssd-$responder.service:
[Unit]
Description=SSSD $(responder) Service responder
Documentation=man:sssd.conf(5)
After=sssd.service
BindsTo=sssd.service

[Install]
Also=sssd-$responder.socket

[Service]
ExecStartPre=-/bin/chown $sssd_user:$sssd_user /var/log/sssd/sssd_autofs.log
ExecStart=/usr/libexec/sssd/sssd_$responder --debug-to-files --socket-activated
Restart=on-failure
User=$sssd_user
Group=$ssd_user
PermissionsStartOnly=true

sssd-$responder.socket:


[Unit]
Description=SSSD $(responder) Service responder socket
Documentation=man:sssd.conf(5)
BindsTo=sssd.service

[Socket]
ListenStream=/var/lib/sss/pipes/$responder
SocketUser=$sssd_user
SocketGroup=$sssd_user

[Install]
WantedBy=sssd.service


And about the different ones? We will get there ... and also explain why they are different.

The infopipe (ifp) unit file:

As the infopipe won't be socket-activated, it doesn't have the its respective .socket unit.
Also, differently than the others responders the infopipe responder can only be run as root nowadays.
In the end, its .service unit looks like:

sssd-ifp.service:
[Unit]
Description=SSSD IFP Service responder
Documentation=man:sssd-ifp(5)
After=sssd.service
BindsTo=sssd.service

[Service]
Type=dbus
BusName=org.freedesktop.sssd.infopipe
ExecStart=/usr/libexec/sssd/sssd_ifp --uid 0 --gid 0 --debug-to-files --dbus-activated
Restart=on-failure

The PAM unit files:

The main difference between PAM responder and the others is that PAM has two sockets that can end up socket-activating its service. Also, these sockets have a special permission.
In the end, its unit files look like:

sssd-pam.service:
[Unit]
Description=SSSD PAM Service responder
Documentation=man:sssd.conf(5)
After=sssd.service
BindsTo=sssd.service

[Install]
Also=sssd-pam.socket sssd-pam-priv.socket

[Service]
ExecStartPre=-/bin/chown $sssd_user:$sssd_user @logpath@/sssd_pam.log
ExecStart=@libexecdir@/sssd/sssd_pam --debug-to-files --socket-activated
Restart=on-failure
User=$sssd_user
Group=$sssd_user
PermissionsStartOnly=true

sssd-pam.socket:
[Unit]
Description=SSSD PAM Service responder socket
Documentation=man:sssd.conf(5)
BindsTo=sssd.service
BindsTo=sssd-pam-priv.socket

[Socket]
ListenStream=@pipepath@/pam
SocketUser=root
SocketGroup=root

[Install]
WantedBy=sssd.service

sssd-pam-priv.socket:
[Unit]
Description=SSSD PAM Service responder private socket
Documentation=man:sssd.conf(5)
BindsTo=sssd.service
BindsTo=sssd-pam.socket

[Socket]
Service=sssd-pam.service
ListenStream=@pipepath@/private/pam
SocketUser=root
SocketGroup=root
SocketMode=0600

[Install]
WantedBy=sssd.service

The NSS unit files:

The NSS responder was the trickiest one to have working properly, mainly because when socket-activated it has to run as root.
The reason behind this is that systemd calls getpwnam() and getgrnam() when using "User="/"Group=" different than root. By doing this libc ends up querying for $sssd_user, trying to talk to NSS responder which is not up yet and then the clients would end up hanging for a few minutes (due to our default_client_timeout) which is something we really want to avoid.

In the end, its unit files look like:

sssd-nss.service:
Description=SSSD NSS Service responder
Documentation=man:sssd.conf(5)
After=sssd.service
BindsTo=sssd.service

[Install]
Also=sssd-nss.socket

[Service]
ExecStartPre=-/bin/chown root:root @logpath@/sssd_nss.log
ExecStart=@libexecdir@/sssd/sssd_nss --debug-to-files --socket-activated
Restart=on-failure

sssd-nss.socket:
[Unit]
Description=SSSD NSS Service responder socket
Documentation=man:sssd.conf(5)
BindsTo=sssd.service

[Socket]
ListenStream=@pipepath@/nss
SocketUser=$sssd_user
SocketGroup=$sssd_user

All the services' units have a "BindsTo=sssd.service" in order to ensure that the service will be stopped when sssd.service is stopped so in case SSSD is shutdown/restart those actions will be propagated to the responders as well.

Similarly to "BindsTo=ssssd.service" there's "WantedBy=sssd.service" in every socket unit and it's there to ensure that, once the socket is enabled it will be automatically started by SSSD when SSSD is started.

And that's pretty much all changes that I've covered with this work.

I really have to say a big thank you to ...

  • Lukas Nykryn and Michal Sekletar who patiently reviewed the unit files we're using and gave me a lot if good tips while doing this work;
  • Sumit Bose who helped me to find out the issue with the NSS responder when trying to run it as a non-privileged user;
  • Jakub Hrozek, Lukas Slebodnik and Pavel Brezina for reviewing and helping me to find bugs, crashes, regressions that fortunately were avoided.

And what's next?

There's already a patch making the {dbus,socket}-activated automatically enabled when SSSD starts, which changes our approach from having to explicit enable the sockets in order to take advantage of this work to explicitly mask the disable (actually, mask) the sockets of the processes that shouldn't be {dbus,socket}-activated.

Also, a bigger work for the future is to also have the providers being socket-activated, but this is material for a different blob post. ;-)

Nice, nice. But I'm having issues with what you've described!

In case it happens to you, please, keep in mind that the referred way to diagnose any issues would be:

  • Inspecting sssd.conf in order to check which are the explicitly activated responders in the services' line;
  • `systemctl status sssd.service`;
  • `systemctl status sssd-$responder.service` (for the {dbus,socket}-activated ones);
  • `journalctl -u sssd.service`;
  • `journalctl -u sssd-$responder.service` (for the {dbus,socket}-activated ones);
  • `journalctl -br`;
  • Checking SSSD debug logs in order to see whether SSSD sockets where communicated