It was the brightest of LEDs, it was the darkest of LEDs, it was the age of data links, it was the age of AHCI enclosure services, …
Today, I’d like to talk about two aspects of a project that I worked on a little while back under the aegis of RFD 89 Project Tiresias. This project covered improving the infrastructure for how we gathered and used information about various components in the system. So, let’s talk about LEDs for a moment.
LEDs are strewn across systems and show up on disks, networking cards, or just to tell us the system is powered on. In many cases, we rely on the blinking lights of a NIC, switch, hard drive, or another component to see that data is flowing. The activity LED is a mainstay of many devices. However, there’s another reason that we want to be able to control the LEDs: for identification purposes. If you have a rack of servers and you’re trying to make sure you pull the right networking cable, it can be helpful to be able to turn a LED on, off, or blink it with a regular pattern. So without further ado, let’s talk about how we control LEDs for network interface cards (NICs or data links) and a class of SATA hard drives.
NIC LEDs
The first class of devices that I’d like to talk about are networking cards. Most every NIC that you can plug an Ethernet cable or a copper/fiber-optic transceiver in has at least two LEDs for each port. One that indicates that traffic is flowing over the link, an activity LED, and one that indicates the fact that the link is actually up. Sometimes different colors are used to indicate different speeds. For example, a 25 GbE capable device may use an orange color to indicate that the link is operating at 10 GbE and a green one to indicate that it is operating at 25 GbE.
The first challenge we have in the operating system is to figure out how to actually tell the device to cause its LEDs to behave in a specific way. Unfortunately, practically every NIC (or family of NICs) has its own unique way of doing this and well, just about everything else. This is the operating system’s curse and one of its primary responsibilities — designing abstractions for device drivers that make it easy to enable new hardware and take advantage of unique features that different devices offer while minimizing the amount of work that is required to do so.
In illumos, there is a framework for networking device drivers called
mac
. The mac
framework is named
after the device driver that implements it,
mac(9E). Sometimes you’ll hear it called
by an alternative name: 'Generic LAN Device version 3' (GLDv3). The
mac
framework uses the concept of capabilities, which represent
different features that hardware offers. For example, this includes such
things as
checksum
offload, TCP
segmentation offload (TSO), and more. Each capability can have
arbitrary data associated with it that describes more information about
the capability itself.
For controlling the LEDs, there’s a new capability called
MAC_CAPAB_LED
. With the capability, a driver has to fill in three
pieces of information:
-
A set of flags to indicate optional behavior. Currently none are defined, but this is here for future expansion.
-
A set of supported modes that the LED can be set to. This includes:
-
MAC_LED_ON
which indicates that the LED should be turned on solidly. -
MAC_LED_OFF
which indicates that the LED should be turned off. -
MAC_LED_IDENT
which indicates that the LED should be set in a way to identify the device. Generally this means that it should blink. When this can use a different color, that’s even better.
-
-
A function that the operating system should call to set the LED state.
The structure looks something like:
typedef struct mac_capab_led {
uint_t mcl_flags;
mac_led_mode_t mcl_modes;
int (*mcl_set)(void *driver, mac_led_mode_t mode, uint_t flags);
} mac_capab_led_t;
Basically, when the operating system wants to change the LED state,
it’ll end up calling the mcl_set
function to set the new mode. When
the LED state should be changed back to normal, a fourth state is
passed: MAC_LED_DEFAULT
. The operating system guarantees that it will
never call this function in parallel for the driver to try to simplify
the programming model, though the driver will likely have I/O ongoing.
Some devices, such as older Intel client parts based on the e1000g
driver don’t actually have blink functionality. As such, the driver
today emulates that, though it would be good to move that into the
broader mac
framework when we need to do that for another driver.
The following devices and drivers currently have support for this functionality:
-
Chelsio T4, T5, and T6 10, 25, 40, and 100 GbE parts based on the
cxgbe
driver -
Intel 1 GbE Client and server parts based on the
e1000g
andigb
drivers -
Intel 10 GbE SFP/Copper devices in the X520, X540, and X550 families based on the
ixgbe
driver -
Intel 10, 25, and 40 GbE devices in the X710 and X722 family based on the
i40e
driver
Plumbing it up in user land
With all of the above hardware support, there’s currently a private
utility in illumos to control these called dlled
that can be found at
/usr/lib/dl/dlled
. Now, you might ask: why is the program hiding in
/usr/lib/dl
? The main reason is that we’re still not sure what the
right interface for controlling this should be. We should likely
integrate it into the dladm(1M)
command and allow the LEDs to be controlled through the Fault Management
Architecture like we do with other LEDs. However, until we figure out
exactly what we want, this gives us something to experiment with.
If you run it on a system, you’ll see something like:
# /usr/lib/dl/dlled
LINK ACTIVE SUPPORTED
igb0 default default,off,on,ident
igb1 default default,off,on,ident
igb3 default default,off,on,ident
igb2 default default,off,on,ident
From here, we can use the -s
option to then control and change the
state. If you set the state, it should persist across anyone pulling or
removing a cable in the device, but nothing that’s set will persist
across a reboot of the system. If we set igb0 to ident mode, then we’ll
see that the current state is updated:
# /usr/lib/dl/dlled -s ident igb0
# /usr/lib/dl/dlled
LINK ACTIVE SUPPORTED
igb0 ident default,off,on,ident
igb1 default default,off,on,ident
igb3 default default,off,on,ident
igb2 default default,off,on,ident
While we have a video to demo this in action, for the moment let’s instead talk about another class of LEDs.
AHCI LEDs
Many systems have an AHCI controller built into their primary chipset. The AHCI Controller is used to attach SATA hard disk drives (HDDs) and solid state drives (SSDs). AHCI stands for 'Advanced Host Controller Interface'. It is a specification that describes how to discover attached devices and send commands to HDDs and SSDs.
Now, you may be saying to yourself, I’ve seen a hard drive or an SSD, but I don’t recall seeing any LEDs on them. And that’s right. Unlike NICs, which have the LEDs built in, the LEDs aren’t built into the drives, but into enclosures — the bays on the system that house drives.
LED Modes
When dealing with hard drives, there have historically been four different things that the system wants to communicate:
-
That the drive in the bay has a fault — it is no longer working or it is OK.
-
That the drive in the bay is 'OK to remove'.
-
To identify a specific bay by blinking the LED.
-
To show that there is activity to the device in the bay.
Typically, the first three items share a single LED, while a second LED is used to drive activity. A side effect of this is that somewhere in either hardware or firmware there is a hierarchy for the first three tasks. For example, if nothing else is going on then the drive’s LED may be a solid green color. If the drive has been faulted, it may instead turn to an amber color. Blinking the LED may be in the amber color or it may be something else entirely.
In the majority of cases, the activity LED isn’t controllable by software; however, the other LED can be. This means that when say ZFS decides a drive is bad, it can eventually lead to the operating system turning on an amber LED to indicate where a problem is.
While we have mentioned the 'OK to remove' LED, that has become less and less commonly used and implemented. It’s listed here for completeness, but may not actually be something that you can control or even see on some systems.
Enclosure Management
As we mentioned earlier, the LEDs are part of the bays and not part of the drives. This has lead to a suite of different standards and means for enclosure management. The one that applies will actually vary depending on the wiring of the system. For example, while the same 2.5 inch drive bay can be used for SATA, SAS, and NVMe devices, the way that the enclosure is controlled varies a lot based on the disk controller and the system’s broader design.
In systems that are using SAS controllers (regardless of whether one uses SATA or SAS drives), there is a specification that describes how to enumerate the different bays that devices can show up in, figure out which drives are in which bays, and control the LEDs and get other information. In a SAS world, this is called SCSI enclosure services or SES. When using an all NVMe system, SES won’t exist. Similarly, when you’re using SATA devices and an AHCI controller, then something different ends up happening. In fact, you’re not even guaranteed that there will be a SES device even in a SAS world!
The AHCI specification provides optional enclosure management features. If you want to follow along in the AHCI specification, open up to chapter 12 — Enclosure Management. There are three primary items the related to enclosure management:
-
A capability bit to indicate whether or not enclosure services are supported.
-
A register that describes the capabilities and controls sending messages.
-
A region of memory for sending and receiving messages.
This region of memory that can be used to send and receive messages is used because the standard allows the system to participate in one of four different messaging schemes. Messages specific to the underlying scheme are placed in a region of memory and if the scheme supports replies, it’ll place replies in there.
Now, the specification supports capability bits for up to four different schemes:
-
The AHCI specification’s LED format
Despite all of the different forms mentioned above, we’ve yet to discover a system that indicates support for something other than the LED format. Though we have found one or two systems that incorrectly say they support the LED format and don’t. As a result, the LED format is the only one that we currently support.
The message format is pretty straightforward. You indicate which port on the controller you care about and then indicate whether you want to enable the identification LED or the fault LED.
Ultimately though, whether or not any of this works at all depends on the manufacturer of the system and what they wire up. Hopefully, if they advertise the LED format support, then we’ll be able to properly drive it.
Wiring it up
To make all of this work, the first thing we had to do was to add support to the ahci(7D) driver to make it detect whether or not the controller had enclosure services and provide a means to control it. The state and capabilities are all managed by a private ioctl(2). These ioctls allow one to get information about what’s supported, the number of ports, and the current state of each port. A second ioctl exists to set the state of a specific port.
Hardware has the constraint that only a single message can be sent at
any given time. Therefore the driver serializes these requests and
centralizes all the activity in a task
queue. In that taskq, we’ll then attempt to send and receive the
messages that hardware expects. You can see the details of creating the
message and writing it to hardware in the
ahci_em_set_led()
function.
Don’t worry. You don’t have to try and write ioctls yourself. Similar to
the dlled
private command, there’s a private command to manipulate
this. Let’s take a look at it:
# /usr/lib/ahci/ahciem
PORT ACTIVE SUPPORTED
ahci0/0 default ident,fault,default
ahci0/1 default ident,fault,default
ahci0/2 default ident,fault,default
ahci0/3 default ident,fault,default
ahci0/4 default ident,fault,default
ahci0/5 default ident,fault,default
You can set the state of an individual port here in the same way as with
the dlled
command. With the 'default' state being its normal behavior,
with neither the ident or fault LED turned on.
Mapping LEDs to bays and disks
Now, there’s still a problem. We haven’t actually solved the problem of mapping up the ports described above to actual disks. Because this is being driven by the AHCI controller, it only provides us the means of toggling this on its logical ports. While we can know what disk is attached to what port, it doesn’t actually tell us where that disk physically is.
In illumos, we have the ability to construct a per-platform map that defines additional information about the topology of the system. It helps us answer the question of what is where. My colleague Jordan Hendricks was the one who solved this problem for us. She added support for us to relate specific bays that we declare in a per-platform topology map to the corresponding port. This allows us to answer the mapping question as well as control the LEDs through the topology like we do for other parts of the system. It’s great work that takes what was a building block and actually makes it useful in the broader system. You can find a lot of examples of it in the illumos bug tracker and you should read the code itself!
See it in action
When I had a working demo of the ability to control NIC LEDs, I ended up recording an impromptu demo with the help of my occasional partner in crime, Alex Wilson.
What’s Next?
If you’re interested in adding support for NIC LEDs to another device that you have, get in touch with the illumos community on IRC in #illumos on Freenode or a mailing list and I or someone else will help you out and see what we can do. Ultimately, both of these changes are building blocks that can be built on top of. Jordan’s work is an example of that in the AHCI case. There’s more that can be built on top of the basic NIC functionality as well.
Next time, we’ll touch on another piece of related work: understanding the state of copper and fiber-optic transceivers for different NICs and reading their EEPROMs.
Previous Entry: CPU and PCH Temperature Sensors in illumos| All Entries | Next Entry: Transceivers: The Device Between the NIC and the Network