Handling Failed Tasks

When attempting to group tasks, I can’t seem to find the right logic to make if... else... with the host’s results function inside my “parent” or “grouping” task. In the below example, I never seem to reach the if results_object.failed logic in my debugging:

def parent_task(task, payload, headers):

    payload["ip"] = task.host.hostname

    # First try with one payload
    results_object = task.run(
        task=http_method, method="post", url="https://myfancyapi.com", json=payload, headers=headers
    )

    if results_object.failed:
        # If we failed, try again with a modified payload...
        payload["auth"] = "new_value"
        results_object = task.run(
            task=http_method,
            on_good=False,
            on_failed=True,
            method="post",
            url="https://myfancyapi.com",
            json=payload,
            headers=headers,
        )

All I really want to do is attempt multiple connection methods to a device as I may not know ahead of time which one will succeed. Any suggestions?

I did find a way to make this logic sort of work, but I don’t know if it is the best way to handle this:

def parent_task(task, payload, headers):

    payload["ip"] = task.host.hostname

    # First try with one payload
    results_object = task.run(
        task=http_method,
        raise_for_status=False,
        method="post",
        url="https://myfancyapi.com",
        json=payload,
        headers=headers,
    )

    if not results_object[0].response.ok:
        # If we failed, try again with a modified payload...
        payload["auth"] = "new_value"
        results_object = task.run(
            task=http_method,
            method="post",
            url="https://myfancyapi.com",
            json=payload,
            headers=headers,
        )

I did a lot of poking about on reset_hosts/recover_host/on_failed etc last week and that’s all good but as you pointed out here… doesn’t do a lot of good within a subtask it seems!

Looked at this a bit and the run method within a task raises an exception if it fails, so there doesn’t seem to be an immediately obvious way to handle it, but hacking a bit I came up with this:

from nornir.core.exceptions import NornirSubTaskError

   def demo_fail(task):
    task.host["failed"] = False
    try:
        task.run(task=commands.remote_command, command=f"touch /Users/carl/Desktop/DEMO-{task.host}")
    except NornirSubTaskError:
        task.host["failed"] = True
    if task.host["failed"]:
        try:
            task.host.close_connections()
        except ValueError:
            pass
        task.host.password = real_password
        task.run(task=commands.remote_command, command=f"touch /Users/carl/Desktop/DEMO-{task.host}")

In my hosts I have two entries – one w/ a good password and one w/ a bad that I reset there in the “if” block to the appropriate one. This ends up w/ my two files touched on my desktop nicely.

Your solution for the particular case where you have a return code you can handle things based on (not an exception being raised in the task run method) is nicer IMO, but if there was an exception raised that wouldn’t handle it I don’t think.

Note: the try/except in the if failed block is there because nornir opened the connection w/ the “bad” password and left it open. I think that probably should be changed so the new password can be set and connection reopened without having to manually close it (and catch the exception) but for demoing this it works well enough :slight_smile:

1 Like

There are indeed some issues present with handling failed tasks and custom logic on failure.
On the surface there are a couple of things:

  1. NornirSubTaskError does not bring any value, we could have raised the original exception instead
  2. There is a problem with closing connections when they weren’t successfully opened
  3. Simpler case with retrying on some exceptions

#2 will be fixed very soon, it’s a bug.
#3 is possible with a retry decorator/wrapper function.
#1 and all other things related to this topic is debatable and more likely any attempts to change it will break backwards compatibility. We need use-cases and suggestions from the community how it should work, so if we need to break backwards compatibility, we do it once in 3.0

2 Likes

The way I handle this is usually like this:

def my_grouped_task(task):
    try:
        res = task.run(something_that_might_fail)
    except NornirSubTaskError as e:
        if isinstance(e.exception, SomeException):
            # handle exception
        elif isinstance(e.exception, SomeOtherException): 
            # handle exception
        else:
            raise e  # I don't know how to handle this
    # do other stuff
    ... 

In the original example I’d just set raise_for_status and check the exception.

I understand it’s not very pythonic and looking at nornir 3.0 I’d love to remove the task.run wrapper so subtasks can be called directly and get the raw response/exception.

1 Like

Here is an example where I implemented something similar to what @dbarrosop has. I thought having this example had some additional value. The example here is a Netmiko SSH device where the password is intentionally incorrect. The password is then fixed in the exception handler.

def reset_connection(host):
    """Remove host from the Nornir connection table."""
    try:
        host.close_connections()
    except ValueError:
        pass

def my_task(task):
    cmd = "show ip int brief"
    try:
        result = task.run(task=netmiko_send_command, command_string=cmd)
    except NornirSubTaskError as e:
        # Check type of exception
        if isinstance(e.result.exception, NetmikoAuthenticationException):
            # Remove the failed result
            task.results.pop()
            reset_connection(task.host)
            # Try again
            task.host.password = os.getenv("NORNIR_PASSWORD")
            task.run(task=netmiko_send_command, command_string=cmd)

How to handling this?
I got this error, and after that my code stop working.

2020-03-12 02:17:40,706 - nornir.core.task -    ERROR -      start() - Host '10.10.10.10': task 'collect_config' failed with traceback:
Traceback (most recent call last):
  File "/home/cacti/miniconda3/envs/ipam/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start
    r = self.task(self, **self.params)
  File "/home/cacti/lutfi/pyScript/getBackup.py", line 29, in collect_config
    command_string=commands[task.host.platform]
  File "/home/cacti/miniconda3/envs/ipam/lib/python3.7/site-packages/nornir/core/task.py", line 147, in run
    raise NornirSubTaskError(task=task, result=r)
nornir.core.exceptions.NornirSubTaskError: Subtask: <function netmiko_send_command at 0x7f56b7a45c20> (failed)


2020-03-12 02:17:40,736 -  nornir.core -     INFO -        run() - Running task 'close_connections_task' with args {} on 7706 hosts

this is my code

def collect_config(task):
    config_dir = "configs"
    pathlib.Path(config_dir).mkdir(exist_ok=True)
    config_dir = config_dir + "/" + str(dt.datetime.today().strftime('%Y%m%d'))
    pathlib.Path(config_dir).mkdir(exist_ok=True)
    commands = {
        "junos": "show configuration",
        "ios": "show running-config",
        "iosxr": "show running-config",
        "huawei": "display current-configuration",
        "ipos": "show configuration",
    } 
    out = task.run(
              task=netmiko_send_command,
              command_string=commands[task.host.platform]
    )
    
    backup = task.run(task=write_file, content=out.result, filename=str(config_dir) + "/" + str(dt.datetime.today().strftime('%Y%m%d-')) + str(task.host.name) + ".txt")
    
    print("changed: ", task.host.name, ' ', backup.changed)
    host.close_connections()