Tales From a Core File - A Tale of Two LEDs

It was the brightest of LEDs, it was the darkest of LEDs, it was the age of data links, it was the age of AHCI enclosure services, …

Today, I’d like to talk about two aspects of a project that I worked on a little while back under the aegis of RFD 89 Project Tiresias. This project covered improving the infrastructure for how we gathered and used information about various components in the system. So, let’s talk about LEDs for a moment.

LEDs are strewn across systems and show up on disks, networking cards, or just to tell us the system is powered on. In many cases, we rely on the blinking lights of a NIC, switch, hard drive, or another component to see that data is flowing. The activity LED is a mainstay of many devices. However, there’s another reason that we want to be able to control the LEDs: for identification purposes. If you have a rack of servers and you’re trying to make sure you pull the right networking cable, it can be helpful to be able to turn a LED on, off, or blink it with a regular pattern. So without further ado, let’s talk about how we control LEDs for network interface cards (NICs or data links) and a class of SATA hard drives.

NIC LEDs

The first class of devices that I’d like to talk about are networking cards. Most every NIC that you can plug an Ethernet cable or a copper/fiber-optic transceiver in has at least two LEDs for each port. One that indicates that traffic is flowing over the link, an activity LED, and one that indicates the fact that the link is actually up. Sometimes different colors are used to indicate different speeds. For example, a 25 GbE capable device may use an orange color to indicate that the link is operating at 10 GbE and a green one to indicate that it is operating at 25 GbE.

The first challenge we have in the operating system is to figure out how to actually tell the device to cause its LEDs to behave in a specific way. Unfortunately, practically every NIC (or family of NICs) has its own unique way of doing this and well, just about everything else. This is the operating system’s curse and one of its primary responsibilities — designing abstractions for device drivers that make it easy to enable new hardware and take advantage of unique features that different devices offer while minimizing the amount of work that is required to do so.

In illumos, there is a framework for networking device drivers called mac. The mac framework is named after the device driver that implements it, mac(9E). Sometimes you’ll hear it called by an alternative name: 'Generic LAN Device version 3' (GLDv3). The mac framework uses the concept of capabilities, which represent different features that hardware offers. For example, this includes such things as checksum offload, TCP segmentation offload (TSO), and more. Each capability can have arbitrary data associated with it that describes more information about the capability itself.

For controlling the LEDs, there’s a new capability called MAC_CAPAB_LED. With the capability, a driver has to fill in three pieces of information:

A set of flags to indicate optional behavior. Currently none are defined, but this is here for future expansion.
A set of supported modes that the LED can be set to. This includes:
- MAC_LED_ON which indicates that the LED should be turned on solidly.
- MAC_LED_OFF which indicates that the LED should be turned off.
- MAC_LED_IDENT which indicates that the LED should be set in a way to identify the device. Generally this means that it should blink. When this can use a different color, that’s even better.
A function that the operating system should call to set the LED state.

The structure looks something like:

typedef struct mac_capab_led {
	uint_t mcl_flags;
	mac_led_mode_t mcl_modes;
	int (*mcl_set)(void *driver, mac_led_mode_t mode, uint_t flags);
} mac_capab_led_t;

Basically, when the operating system wants to change the LED state, it’ll end up calling the mcl_set function to set the new mode. When the LED state should be changed back to normal, a fourth state is passed: MAC_LED_DEFAULT. The operating system guarantees that it will never call this function in parallel for the driver to try to simplify the programming model, though the driver will likely have I/O ongoing.

Some devices, such as older Intel client parts based on the e1000g driver don’t actually have blink functionality. As such, the driver today emulates that, though it would be good to move that into the broader mac framework when we need to do that for another driver.

The following devices and drivers currently have support for this functionality:

Chelsio T4, T5, and T6 10, 25, 40, and 100 GbE parts based on the cxgbe driver
Intel 1 GbE Client and server parts based on the e1000g and igb drivers
Intel 10 GbE SFP/Copper devices in the X520, X540, and X550 families based on the ixgbe driver
Intel 10, 25, and 40 GbE devices in the X710 and X722 family based on the i40e driver

Plumbing it up in user land

With all of the above hardware support, there’s currently a private utility in illumos to control these called dlled that can be found at /usr/lib/dl/dlled. Now, you might ask: why is the program hiding in /usr/lib/dl? The main reason is that we’re still not sure what the right interface for controlling this should be. We should likely integrate it into the dladm(1M) command and allow the LEDs to be controlled through the Fault Management Architecture like we do with other LEDs. However, until we figure out exactly what we want, this gives us something to experiment with.

If you run it on a system, you’ll see something like:

# /usr/lib/dl/dlled
LINK                 ACTIVE       SUPPORTED
igb0                 default      default,off,on,ident
igb1                 default      default,off,on,ident
igb3                 default      default,off,on,ident
igb2                 default      default,off,on,ident

From here, we can use the -s option to then control and change the state. If you set the state, it should persist across anyone pulling or removing a cable in the device, but nothing that’s set will persist across a reboot of the system. If we set igb0 to ident mode, then we’ll see that the current state is updated:

# /usr/lib/dl/dlled -s ident igb0
# /usr/lib/dl/dlled
LINK                 ACTIVE       SUPPORTED
igb0                 ident        default,off,on,ident
igb1                 default      default,off,on,ident
igb3                 default      default,off,on,ident
igb2                 default      default,off,on,ident

While we have a video to demo this in action, for the moment let’s instead talk about another class of LEDs.

AHCI LEDs

Many systems have an AHCI controller built into their primary chipset. The AHCI Controller is used to attach SATA hard disk drives (HDDs) and solid state drives (SSDs). AHCI stands for 'Advanced Host Controller Interface'. It is a specification that describes how to discover attached devices and send commands to HDDs and SSDs.

Now, you may be saying to yourself, I’ve seen a hard drive or an SSD, but I don’t recall seeing any LEDs on them. And that’s right. Unlike NICs, which have the LEDs built in, the LEDs aren’t built into the drives, but into enclosures — the bays on the system that house drives.

LED Modes

When dealing with hard drives, there have historically been four different things that the system wants to communicate:

That the drive in the bay has a fault — it is no longer working or it is OK.
That the drive in the bay is 'OK to remove'.
To identify a specific bay by blinking the LED.
To show that there is activity to the device in the bay.

Typically, the first three items share a single LED, while a second LED is used to drive activity. A side effect of this is that somewhere in either hardware or firmware there is a hierarchy for the first three tasks. For example, if nothing else is going on then the drive’s LED may be a solid green color. If the drive has been faulted, it may instead turn to an amber color. Blinking the LED may be in the amber color or it may be something else entirely.

In the majority of cases, the activity LED isn’t controllable by software; however, the other LED can be. This means that when say ZFS decides a drive is bad, it can eventually lead to the operating system turning on an amber LED to indicate where a problem is.

While we have mentioned the 'OK to remove' LED, that has become less and less commonly used and implemented. It’s listed here for completeness, but may not actually be something that you can control or even see on some systems.

Enclosure Management

As we mentioned earlier, the LEDs are part of the bays and not part of the drives. This has lead to a suite of different standards and means for enclosure management. The one that applies will actually vary depending on the wiring of the system. For example, while the same 2.5 inch drive bay can be used for SATA, SAS, and NVMe devices, the way that the enclosure is controlled varies a lot based on the disk controller and the system’s broader design.

In systems that are using SAS controllers (regardless of whether one uses SATA or SAS drives), there is a specification that describes how to enumerate the different bays that devices can show up in, figure out which drives are in which bays, and control the LEDs and get other information. In a SAS world, this is called SCSI enclosure services or SES. When using an all NVMe system, SES won’t exist. Similarly, when you’re using SATA devices and an AHCI controller, then something different ends up happening. In fact, you’re not even guaranteed that there will be a SES device even in a SAS world!

The AHCI specification provides optional enclosure management features. If you want to follow along in the AHCI specification, open up to chapter 12 — Enclosure Management. There are three primary items the related to enclosure management:

A capability bit to indicate whether or not enclosure services are supported.
A register that describes the capabilities and controls sending messages.
A region of memory for sending and receiving messages.

This region of memory that can be used to send and receive messages is used because the standard allows the system to participate in one of four different messaging schemes. Messages specific to the underlying scheme are placed in a region of memory and if the scheme supports replies, it’ll place replies in there.

Now, the specification supports capability bits for up to four different schemes:

SAF-TE protocol
SES-2 protocol
SGPIO
The AHCI specification’s LED format

Despite all of the different forms mentioned above, we’ve yet to discover a system that indicates support for something other than the LED format. Though we have found one or two systems that incorrectly say they support the LED format and don’t. As a result, the LED format is the only one that we currently support.

The message format is pretty straightforward. You indicate which port on the controller you care about and then indicate whether you want to enable the identification LED or the fault LED.

Ultimately though, whether or not any of this works at all depends on the manufacturer of the system and what they wire up. Hopefully, if they advertise the LED format support, then we’ll be able to properly drive it.

Wiring it up

To make all of this work, the first thing we had to do was to add support to the ahci(7D) driver to make it detect whether or not the controller had enclosure services and provide a means to control it. The state and capabilities are all managed by a private ioctl(2). These ioctls allow one to get information about what’s supported, the number of ports, and the current state of each port. A second ioctl exists to set the state of a specific port.

Hardware has the constraint that only a single message can be sent at any given time. Therefore the driver serializes these requests and centralizes all the activity in a task queue. In that taskq, we’ll then attempt to send and receive the messages that hardware expects. You can see the details of creating the message and writing it to hardware in the ahci_em_set_led() function.

Don’t worry. You don’t have to try and write ioctls yourself. Similar to the dlled private command, there’s a private command to manipulate this. Let’s take a look at it:

# /usr/lib/ahci/ahciem
PORT                 ACTIVE       SUPPORTED
ahci0/0              default      ident,fault,default
ahci0/1              default      ident,fault,default
ahci0/2              default      ident,fault,default
ahci0/3              default      ident,fault,default
ahci0/4              default      ident,fault,default
ahci0/5              default      ident,fault,default

You can set the state of an individual port here in the same way as with the dlled command. With the 'default' state being its normal behavior, with neither the ident or fault LED turned on.

Mapping LEDs to bays and disks

Now, there’s still a problem. We haven’t actually solved the problem of mapping up the ports described above to actual disks. Because this is being driven by the AHCI controller, it only provides us the means of toggling this on its logical ports. While we can know what disk is attached to what port, it doesn’t actually tell us where that disk physically is.

In illumos, we have the ability to construct a per-platform map that defines additional information about the topology of the system. It helps us answer the question of what is where. My colleague Jordan Hendricks was the one who solved this problem for us. She added support for us to relate specific bays that we declare in a per-platform topology map to the corresponding port. This allows us to answer the mapping question as well as control the LEDs through the topology like we do for other parts of the system. It’s great work that takes what was a building block and actually makes it useful in the broader system. You can find a lot of examples of it in the illumos bug tracker and you should read the code itself!

See it in action

When I had a working demo of the ability to control NIC LEDs, I ended up recording an impromptu demo with the help of my occasional partner in crime, Alex Wilson.

What’s Next?

If you’re interested in adding support for NIC LEDs to another device that you have, get in touch with the illumos community on IRC in #illumos on Freenode or a mailing list and I or someone else will help you out and see what we can do. Ultimately, both of these changes are building blocks that can be built on top of. Jordan’s work is an example of that in the AHCI case. There’s more that can be built on top of the basic NIC functionality as well.

Next time, we’ll touch on another piece of related work: understanding the state of copper and fiber-optic transceivers for different NICs and reading their EEPROMs.

Previous Entry: CPU and PCH Temperature Sensors in illumos| All Entries | Next Entry: Transceivers: The Device Between the NIC and the Network