Logging Etiquette

Logging etiquette is something that should naturally pop-up in any codebase that grows complex enough (hopefully before it is actually needed the most 😅🤞). When you start having too many simultaneously moving parts, crash reports and stack traces won’t always be enough to understand the underlying cause of a bug, and this is where logs will shine the brightest. If implemented correctly, they might just provide you with the missing puzzle pieces 🧩.

# How to log?

First things first, please don’t just print your “logs” to the console or manually store them in an arbitrary file. Those basic approaches lack standardization and thus are harder to manage (retrieve, upload, parse, etc.). Especially since some amazing people have poured countless hours into the amazing tools and standards we’re about to discuss. Use he right tool for the job.

Most systems and languages will provide an OOTB logging mechanism in some shape or form, and it will most likely be called Logger or simply Log. The easiest way to start logging is to just use it. Don’t reinvent a logging mechanism unless you absolutely have to. At most, wrap/extend the existing logging mechanism and add the things you think are missing.

Producing logs is the easy part. The real issues are knowing what and when to log. Record too many logs and you will clutter your logfiles. Record too few and you won’t get meaningful insights. It’s a delicate balancing act with no correct answer ⚖️ but there are some industry guidelines to steer you in the right direction.

# What to log?

The first step should be to pick a suitable logging level for whatever it is you want to log. This is because most log readers allow you to filter logs by level 🥅.

Most systems provide the following levels (in descending degree of importance):

💥 Critical
🟥 Error
🟨 Warning
🟦 Notice
🔍 Info
🐛 Debug

These levels should encompass all of your logging needs, and once you’ve picked one, the second step should be to write a meaningful log message:

Provide enough context - the log should act as a form of breadcrumb for a chain of events
Describe what has happened / is happening / will happen - the action that succeeded/failed, the decision that was made, the process that will be started/continued/paused/finished, a progress update, a state change, etc.
Write it in human-readable english - for better or worse, it’s the industry standard. The stack traces and crash reports will handle all of the technical non-human-readable aspects for you.

These guidelines help you future-proof your logs so that 6 months from now they are still helpful, instead of a meaningless assortment of characters.

# When to log?

The low hanging fruits of the logging world are errors and warnings. If the program was going to crash and you avoided it, record an error log for it and the conditions that led to it. If something unexpected happens (a failed initialisation, a missing file, etc.) that would otherwise have led to a crash or error state, record a warning log for it. Similarly, notice logs can help bridge the gaps between error and warning logs by identifying the happy-paths/conditions up until that point (user actions, navigations, etc.).

After some time writing logs you get some intuition, and if you find yourself asking “Should I log this?” then it’s probably a good idea to log it (with a suitable level, of course).

# When to log thrown errors?

I have rule of thumb when deciding “Should I log this error?”

If I call a throwing function and perform error handling, then I also have to log the error.

This “error handling” can be anything from returning a default value (nil, empty array, etc.), to updating the UI (showing an error alert to the user, changing loading status, etc.). Put in other terms, if the error changes the code path it’s probably worth logging it.

This rule of thumb also nudges us towards not catching, logging, and (re)throwing the same error. This is important, as duplicate logs can cause confusion when troubleshooting.

Let’s look at some examples in a Swift MVVM codebase. For sample purposes, assume that the error logger we are using just outputs whatever we feed it, prepended by the class’ name it is declared in.

enum CustomError: Error {
    case boom
}

enum SomeOtherError: Error {
    case anotherBoom
}

func someUnderlyingMethodThatThrows() async throws -> [Result] {
    throw CustomError.boom
}

class Repository {
  private let logger // ...

  // ✅ doesn't error handle, don't log
  func methodThatThrows() async throws -> [Result] {
    try await someUnderlyingMethodThatThrows()
  }

  // ✅ error handles, log
  func methodThatErrorHandlesButStillThrows() async throws -> [Result] {
    do {
      try await someUnderlyingMethodThatThrows()
    } catch {
      logger.error(error)
      throw SomeOtherError.anotherBoom // ⚠️ notice how it throws a different error from the one we caught
    }
  }

  // ❌ avoid this pattern
  func methodThatJustRethrows() async throws -> [Result] {
    do {
      try await someUnderlyingMethodThatThrows()
    } catch {
      logger.error(error)
      throw error // ⚠️ notice how it throws the same error we just caught
    }
  }
}

class VM {
  private let repository = Repository()
  private let logger // ...

  // ✅ error handles, logs
  func methodWithErrorHandling() async -> [Result] {
    do {
      try await repository.methodThatThrows()
    } catch {
      logger.error(error)
      return []
    }
  }

  // ✅ error handles, logs
  func anotherMethodWithErrorHandling() async {
    do {
      try await repository.methodThatThrows()
    } catch {
      logger.error(error)
      updateTheUI() // show an alert, stop the loading icon, etc.
    }
  }

  // ✅ doesn't error handle, don't log
  func methodWithoutErrorHandling() async throws -> [Result] {
    try await repository.methodThatThrows()
  }

  // ✅ error handles partially, logs, throws what it doesn't handle
  func methodWithPartialErrorHandling() async throws -> [Result] {
    do {
      try await repository.methodThatThrows()
    } catch CustomError.boom { // ⚠️ notice how it only catches a specific error, it will throw all others without logging
      logger.error(error)
      return []
    }
  }

  // ❓ error handles, but... why is this "broken"?
  func inadvertentlyBrokenLoggingMethod() async -> [Result] {
      try await repository.methodThatJustRethrows()
    } catch {
      logger.error(error)
      return []
    }
  }
}

Why should you avoid that pattern marked with an ❌ ? Let’s look at them in action. If we call VM.inadvertentlyBrokenLoggingMethod() it produces a paper trail of duplicate error logs

Repository - CustomError.boom
VM - CustomError.boom

This is because the underlying repository.methodThatJustRethrows() will itself also log the error before throwing it up 🤕

Thankfully, we can fix this quite easily, we just need to remove problematic catch-log-throw snippets, and leave everything else as is. Starting from leaf/end call-sites, we repeat this process while going up the call-path, leaving the logging to a later place/time where we actually handle the error.

// ❌ replace this pattern
func someMethod() async throws -> [Result] {
    do {
      try await someUnderlyingMethodThatThrows()
    } catch {
      logger.error(error)
      throw error
    }
}

// ✅ with this pattern
func someMethod() async throws -> [Result] {
    try await someUnderlyingMethodThatThrows()
}

In this case, this means we apply this patch to Repository.methodThatJustRethrows(), and leave VM.inadvertentlyBrokenLoggingMethod() as is. VM.inadvertentlyBrokenLoggingMethod() is no longer “broken” at all, and now produces a single

VM - CustomError.boom

🥳

# Why is this important?

What’s so bad about a paper trail of duplicate error logs? A couple of reasons:

It clutters the logs - this is a simple case, we only duplicate it once, but imagine a chain with 4 or 5 log&rethrows along the way 😨
We aren’t actually getting any new information with this paper trail - when inspecting the call-site that logged the error, we can just follow the called functions until we arrive at the throwing point of the error 🔍 (this ties in with filling the errors with relevant information!)
There is no real guarantee that this paper trail represents a chain of events - other logs can occur in other places of our system, and it no longer becomes a paper trail. It can even have the opposite effect e.g.
```
Repository - CustomError.boom
AnotherPlace - anotherThrownError
VM - CustomError.boom
```
There is no way of knowing if the Repository CustomError.boom log is related to the next VM CustomError.boom log. We can assume it is, but cannot be 100% sure just from looking at the logs. This is particularly troublesome if our Repository is reused in other places; did the VM’s repository instance throw the error, or was it thrown from another (possibly) unrelated call-site? We might never know for sure.

# When to log warnings?

I personally like to sprinkle warnings whenever a non-fatal unhappy-path arises, just before carrying on as I normally would. In Swift, I find that most warnings stem from (but are not limited to) casting and handling optionals:

casting & handling optionals - e.g. dequeue cells for TableViews/CollectionViews

override func tableView(_ tableView: UITableView,
                      cellForRowAt indexPath: IndexPath) -> UITableViewCell {

    guard let cell = tableView.dequeueReusableCell(withIdentifier: cellReuseIdentifier) as? Cell else {
        logger.warning("Failed to dequeue cell with identifier \"\(cellReuseIdentifier)\"")
        return dequeueFallBackCell()
    }

    // ...
}

handling optionals - e.g. loading optional content

guard let someContent = await loadContent(with: someId) else {
    logger.warning("Failed to load content with ID \"\(someId)\"")
    return nil
}

return someContent

handling optionals - e.g. failable initialisers

guard let url = URL(someInvalidUrlString) else {
    logger.warning("Failed to create URL from \"\(someInvalidUrlString)\"")
    return
}

# When to log notices?

I like to signal the beginning and end of the happy path of any asynchronous or long-lived function, and I try to attribute the logs as best as I can (usually with a relevant ID to the task at hand).

func fetchSomeDataFromDatabase(for id: String) async -> SomeData {
    logger.notice("Fetching SomeData for \"\(id)\"...")

    // perform said computation...

    logger.notice("Fetched SomeData for \"\(id)\".")
}

Some would consider this to be overkill, and I can’t really blame them. Remember, if we start logging too much we risk polluting the logs with useless information, at which point it might become a footgun 🦶🔫. This is why I advocate only adding them only to long-running or asynchronous tasks, as their logs are more likely to be broken up by other unrelated logs, but also because they might help debug performance issues: the function is taking too long; the function is not finishing for undocumented reasons (early exits, out-of-control branching, etc.)

# Putting it all together 🧩

Whether you agree with me or not, I find that it helps me get the full picture:

when the function started (notice 🟦)
when the inner-workings would have caused a crash (error handling 🟥)
when something unlikely happened but we handled it gracefully (warning handling 🟨)
when the function successfully finished (notice 🟦)

and here’s a complete example to show just that

func fetchSomeDataFromDatabase(for id: String) async -> SomeData? {
    logger.notice("Fetching SomeData for \"\(id)\"...")

    guard let databaseConnection = database.connect() else {
        logger.warning("Failed to fetch SomeData with ID \"\(id)\", no database connection.")
        return nil
    }

    do {
        let data = try await databaseConnection.load(with: id)
    } catch {
        logger.error(error)
        return nil
    }

    logger.notice("Fetched SomeData for \"\(id)\".")
    return data
}

for which we should only ever have the following recorded outcomes:

handled an issue with the database connection

🟦 Fetching SomeData for "<id>"...
// potentially other logs...
🟨 Failed to fetch SomeData with ID "<id>", no database connection.

encountered an error

🟦 Fetching SomeData for "<id>"...
// potentially other logs...
🟥 <Error description>

everything according to plan

🟦 Fetching SomeData for "<id>"...
// potentially other logs...
🟦 Fetched SomeData for "<id>".

# Structured Logging

After publishing this article, a fellow redditor (thanks u/Freeky !) showed me a complementary form of logging which I wasn’t aware of: structured logging.

It consists of breaking log messages into parseable metadata which we can then manipulate, search, inspect, and even action upon. Here’s a traditional log following all of the guidelines I showed you earlier

logger.info("Accepted connection \(connection.id) from \(connection.peer), total: \(connections.count)")

and here’s the same log following structured logging semantics

logger.info(
  "Accepted connection",
  metadata: [
    "connection.id": "\(connection.id)",
    "connection.peer": "\(connection.peer)",
    "connections.total": "\(connections.count)"
  ]
)

They both convey the same information, but the latter is much easier to parse. Keeping this example in mind, this is particularly useful if we want to create a service to monitor these logs and send us a notification if we get more than a certain amount of connections.total for over 5 minutes. This is just a very basic usecase, but it exemplifies the world of possibilities that structured logging opens up.

Furthermore, since we haven’t lost any information, we can still easily print-out this log in the former human-readable format

"Accepted connection \(metadata["connection.id"]) from \(metadata["connection.peer"]), total: \(metadata["connections.total"])"

Do keep in mind that performing this kind of semantic segmentation if you have no intentions of analyzing or making your logs actionable is probably counter productive. In iOS this is evidenced by the fact that you don’t have access to a Structured Logging API, rather it incentivizes you to use interpolated String messages, instead you require an entirely different swift-log package. However, being aware of this pattern 🔧 is always another fine addition to our engineering toolbox 🧰, and we could probably extend/adapt the existing API to add structured logging to it as well.

# Final thoughts

Do note that there is no “correct” way of deciding if/when/how we should log something, but these suggestions seem to agree with other accepted industry practices. I’d also argue that keeping logs tidy and relevant is second only to recording them 🧹 Hopefully this post has convinced you to give more thought to your logs. Until next time 👋.

aclima

Where I write down and catalogue my thoughts