Thursday, January 17, 2019

AWS Systems Manager (SSM) Run Command Troubleshooting

I have been working with AWS SSM for couple of months, but I found the troubleshooting document on their website lacks straightforward answers. So I provide the problems that I encountered and the solution based on my experience.

Problem #1: The instance is not visible in AWS Systems Manager Console although documentation says the agent has been installed by default.

Problem #2: The instance is visible, but "Run Command" took too long and even timed out.

Solution:
  1. First thing I would check is whether the instance has a role attached to it.
  2. If so, make sure the role has AmazonEC2RoleforSSM policy attached to it since permission is required for the agent to do health check.
  3. If after all the above has been confirmed, check if the latest SSM agent has been installed and running.
  4. If SSM agent is at the latest and running, check if it is hibernating. The hibernate logic has exponential backoff, so it might not respond for a long time.
  5. If it is hibernating, we can simply restart the agent.
    • On Windows, we can run Restart-Service AmazonSSMAgent PowerShell command.
    • On Linux, we can run sudo restart amazon-ssm-agent shell command.
  6. If all the above fails, it is time to get into the log files.
    • On Windows:
      • %PROGRAMDATA%\Amazon\SSM\Logs\amazon-ssm-agent.log
      • %PROGRAMDATA%\Amazon\SSM\Logs\errors.log
    • On Linux:
      • /var/log/amazon/ssm/amazon-ssm-agent.log
      • /var/log/amazon/ssm/errors.log
  7. If log files doesn't give enough information, we can enable debug logging which will give more information. This requires quite a number of steps, so refer to the reference link below.
Reference

No comments:

Post a Comment