What is a Directory Store? tldr: it's like a yellow pages

  • directorystore
  • forgerock
  • ldap
  • business
  • architecture

Directory stores are ancient software that fit within a particular use case that’s not commonly found across the rest of software development - storing a directory of entities that are accessible by grouping and heritage. That’s a more mature way of saying that directory stores are digital phone-books. Primarily they’re used for things like Microsoft’s Active Directory, Azure Active Directory, Jumpcloud’s LDAP, ForgeRock’s OpenDS and most other services that manage hierarchical data such as people. Directory Stores were originally designed for organising, categorising and grouping people. However, the way they store and manage data could be used for general data management for any number of applications. Let’s have a look.

When computing was taking off in the 1980s enhancing the ability to store, control and manage vast amounts of data, the International Telecommunication Union needed to categorise the hierarchy and format of directory data This is essentially the data you found in phone-books such as name, address, telephone number, location, etc. They formalised the X.500 standard which described the format common data should be held in, alongside the related technologies for handling directories such as X.509 certificates, the Directory Access Protocol communication standard, various cryptography protocols and other foundational stuff that most of humanity unknowingly relies on still.

What matters to you, the non-technical stakeholder/googling techie, reading this article is what directory stores can do and why you would want to use them. For illustration I’m going to use ForgeRock’s OpenDS as it’s the Directory Store offering that I know best. Microsoft’s Active Directory is going to be the most common directory store in use today, however, it’s been Microsoft-ed, meaning that it has lots of non-standard elements that are specific to Active Directory. It’s also often managed via graphical user interfaces which tend to abstract what’s actually happening behind the scenes.

What sort of data goes into a directory store?

Today, I am your example. You generally know who I am as my name is all over this site. Without getting conspiratorial I can confirm I’m a unique entity and I have recently eaten some wine gums. I’m the perfect target audience for a confectioner who wants to have a record of people that eat wine gums. Maybe I’m a customer or maybe I filled out a competition form on the confectioner’s website, either way, it’s useful for the confectioner to have my data stored somewhere.

Here’s what info they have from the last order I placed:

Name: Andrew Southall
Location: United Kingdom
Wants to be contacted: No
Wants to be contacted about wine gums: Yes please
Last time they ordered wine gums: Jan 4th 2023

Great. Based on the X.500 standard my standardised data would look like this:

cn: Andrew
sn: Southall
c: GB

There’s a couple of issues here though. The first is that the UK has other people in it called ‘Andrew Southall’ that I haven’t sued for infringement yet, hence the real Andrew Southall is not unique on his name alone. We need to make sure that each entry in the directory store is unique and has what we can call a Distinguished Name, ie a name that distinguishes this Andrew Southall from all the UK based impostors. I’m going to be lazy and generate a UUID which is basically a random number out of trillions and trillions of possible numbers. That’ll be the distinguishing ID for the real Andrew Southall. Also if he changes name or flees the country we can still link his historic wine gum purchases to his account.

uid: befbc58b-321c-43be-9eed-b8b68cad9a49

With a unique ID like this I can simply search for (uid=befbc58b-321c-43be-9eed-b8b68cad9a49) and get exactly this record. However, I could choose to set the directory up as a full tree where I’m part of the Human organisation, which is part of planet Earth:

dn: uid=befbc58b-321c-43be-9eed-b8b68cad9a49,o=Human,dc=Earth

I could also separate by country and this is how TLS certificates work which follow the X.500 standard exactly. However, if I were to change country my Distinguished Name would change and therefore I’d technically be a different entity. For example if I moved to France and we had Country be part of the Directory Tree then my DN would change:

# From
dn: uid=befbc58b-321c-43be-9eed-b8b68cad9a49,c=GB,o=Humans,dc=Earth
# To
dn: uid=befbc58b-321c-43be-9eed-b8b68cad9a49,c=FR,o=Humans,dc=Earth

For some things this is desired, such as tax residency. I wouldn’t want to turn up in a search by the UK tax man as I’d be based in France. For wine gum retailing the worst that would happen is that I’d be charged more for delivery. Hence we can drop the country from our database.

It’s also perfectly viable to drop the ‘Humans’ Organisation and the ‘Earth’ Domain Component seen as most animals don’t have jobs and the rest of galaxy seems pretty lifeless. We could just operate with unique IDs. It’s up to the admin to structure the directory tree as needed for appropriate logical separation. For this scenario I’m just going to search on UUIDs - they’re already ultra-unique identifiers and directory stores can handle many thousands of entities pretty quickly. If I get the point selling wine gums that we need to start tuning the directory store I’m dropping an IPO and retiring. I’m happy with Distinguished Names only being UUIDs.

One thing that I can’t escape - it’s up to the admin to add in any properties that don’t exist in the standard. Shockingly, there’s no directory standard for whether or not I want to be contacted about wine gums.

Making up your own data

Directory stores are databases filled with entities. These entities are usually people but they could be anything, so we have the ability to add properties and classifications to the entities added to the store. In fact, we could classify our entities as a wineGumCustomer so that they’re different from regular persons or non-human entities like laptops, cigars or cats that I may also add to the directory store.

Before making classes, we should add in the attributes that these entities need to have. So our wineGumCustomers need to have when they last purchased wine gums, if they want to be contacted about wine gums and if they want to be contacted at all. We can’t say that wineGumCustomers need all these details if we don’t tell the database what those details are first.

So we’ll add this to the database schema - the database model - which is an entity whose common name is ‘schema’. More precisely, it is the entity whose common name is schema and is not part of any Domain Components, Organisations, etc. If we stick with the tree description of a directory store, the schema is a leaf attached directly to the trunk with just the name schema. It doesn’t matter how big or complex the tree gets, it’s always in that same place. Lets modify it to add three new attributes all about wineGumCustomers:

dn: cn=schema
changetype: modify
add: attributeTypes
attributeTypes: ( 2.5.4.200 NAME 'lastTimeBoughtWineGums' DESC 'The last time a customer bought wine gums' EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.24 SINGLE-VALUE USAGE directoryOperation X-ORIGIN 'Confectioner' )
attributeTypes: ( 2.5.4.201 NAME 'communicationsAll' DESC 'Whether a customer is of for us to contact them about anything' EQUALITY booleanMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.7 SINGLE-VALUE )
attributeTypes: ( 2.5.4.202 NAME 'communicationsWineGums' DESC 'Whether a customer is of for us to contact them only about wine gums' EQUALITY booleanMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.7 SINGLE-VALUE )

Now we can also add to the wineGumCustomer classification which relies on these values:

dn: cn=schema
changetype: modify
add: objectClasses
objectClasses: ( 2.5.6.50 NAME 'wineGumCustomer' SUP top STRUCTURAL MUST ( 2.5.4.200 $ 2.5.4.201 $ 2.5.4.202 ) )

When those files are applied to the directory server it now knows about a wineGumCustomer who MUST have the properties lastTimeBoughtWineGums, communicationsAll and communicationsWineGums. So anything in the directory store that we add the wineGumCustomer classification to must have those properties added to it.

As I’m using OpenDS I’m also going to define the real Andrew Southall as a person. A person and a few of the other classes added below are freebie definitions in the ForgeRock OpenDS directory store that I can utilise rather than having to go through the tedium of defining a person directly. The format of most directory stores is standardised so a lot of effort has already been spent on categorising common entities.

Hence the definition of the real Andrew Southall is:

dn: uid=befbc58b-321c-43be-9eed-b8b68cad9a49
cn: Andrew
sn: Southall
c: GB
uid=befbc58b-321c-43be-9eed-b8b68cad9a49
objectClass: person
objectClass: inetOrgPerson
objectClass: top
objectClass: wineGumCustomer
communicationsAll: FALSE
communicationsWineGums: TRUE
lastTimeBoughtWineGums: 20230104000000Z

The cool part

That’s great but why would you do this as a business and what do you get out of it?

Well with the classification and standardisation you have quantified users and implemented them into a data storage system that is built around selecting subsections of your data. Let’s show some examples:

Find all the wineGumCustomers

ldapSearch ‘(objectClass=wineGumCustomer)’

Find all the wineGumCustomers who are ok with all communications

ldapSearch ‘(&(objectClass=wineGumCustomer)(communicationsAll=TRUE))’

Find all the wineGumCustomers who are not ok with all communications but are fine with wine gum specific communications

ldapSearch ‘(&(objectClass=wineGumCustomer)(communicationsAll=FALSE)(communicationsWineGums=TRUE))’

Find all the wineGumCustomers who are fine with wine gum specific communications who bought wine gums some time in 2023

ldapSearch ‘(&(objectClass=wineGumCustomer)(communicationsWineGums=TRUE)((lastTimeBoughtWineGums:1.3.6.1.4.1.26027.1.4.7:=2023)))’

Find a single guy called Frank

ldapSearch -z 1 ‘(cn=Frank)’

The best part about this is that it’s all really fast - because the data has been classified and segregated it’s a lot easier to store and sift through. You can start searching in particular branches of the tree or through the whole directory.

Most databases are table based, meaning that they’re essentially a spreadsheet like Excel. With some data this can get one dimensional and it ends up getting split into multiple tables, databases and then cut and joined back together again. There’s nothing wrong with that, but if the data is hierarchical then a Directory Store is potentially a better use case.

Directory Stores operate in the same way your files are ordered into folders / directories on your computer. Do you organise the files on your computer into different hierarchical directories or do you dump everything onto the Desktop? Same concept.

I’d say that directory stores are underutilised, however there is a learning curve compared to the traditional database as it comes with the baggage of the LDAP protocol, some historical gunk and learning something that isn’t seen as much in the marketplace as regular old structured data.

Lessons

Directory stores are an option for managing, storing and retrieving things. As there’s extra dimensions to the way data is stored and indexed in a directory store they’re great at representing complex things that can be many things at once such as people and wineGumCustomers. There’s a learning curve however they can be used to get practical and actionable data from the store in a performant and accessible way.

I personally assume that the biggest obstacle to directory store adoption is committing to defining a data schema and committing to manage data to fitting into that schema. I reckon NoSQL databases got popular as they have no schema at all, so there’s no up front thinking about hard questions. Just dump stuff in a key/value store and chuck your application out the door MoVe FaSt AnD BrEaK ThInGs

An important element of Directory Stores are that they often come provisioned with a schema that classifies entities into categories. Having data classified and quantified helps computers work appropriately. Having basic classifications good to go means you’re wrestling with tedious design decisions such as defining what a person is, adding all 190+ country definitions, are apostrophes acceptable in last names, etc. Also, you get indexing and replication capabilities built in. If you scale heavily and start handling even a few hundred thousand customers it’s a huge relief to have indexing and replication ready to go.

Overall, enforcing data classification is what I think is important here. In this article I’ve only classified the wineGumCustomer, their communication preferences and when they last bought wine gums. The directory store schema provides a strong framework to classify and manage data that enforces good data management patterns that scale well.

One last quick note…

The samples above may not be complete configuration as different directory stores may have their own provisioned categories. The number sequences above (such as 1.3.6.1.4.1.26027.1.4.7) are locations that need to already exist and be suitable in whatever directory store you’re running. Other numbers like 2.5.6.50 (the wineGumCustomer classification) I literally made up, although made up with my years of technological warfare and intergalactic brain power. Your mileage may vary.

Queries

Contact