Thursday, October 1, 2015

ESXTOP Capture scripts with various sampling rates and intervals



There are many sources on the web for performing ESXTOP captures. What I found in my experience, however, was that I would often have to create multiple captures on several ESX hosts at once or captures that would run for hours at a time. Of course CRON can be used to perform scheduled captures, but if you are experiencing a problem and want to kick off an ESXTOP capture on the spot, it's helpful to have something you can just copy and paste. Also, it's helpful to have information such as the host name the capture was run on and the time that it started and stopped easily determined.

Here are some ESXTOP capture scripts that can be used across multiple hosts at the same time and logging the output into the same location. The logs are saved with the host name in the file name so that they can be run across multiple hosts and each one will clearly identify (1) the host it was running on (2) the start time of the capture and (3) the interval and length of the capture.

Example 1 - Short ESXTOP captures across a few hours, higher detail
Sampling interval - 10 seconds
Capture length per file - 15 mins
# of Captures - 48
Total # of samples  = 90*48 = 4,320
Total esxtop capture interval across all logs - 12 hours (48 logs at 15 min each)

for i in $(seq 1 48);do esxtop -a -b -d 10 -n 90 > /vmfs/volumes/VMFS-T2-LUN10/ESXTOP/esxtop-$(date +%m-%d-%H%M%S)-$(hostname)-15min-10s.csv;done

Example 2 - Longer ESXTOP captures across a few hours, moderate detail
Sampling interval - 30 seconds
Capture length per file - 1 hour
# of Captures - 12
Total  # of samples = 120*12 = 1,440
Total esxtop capture interval across all logs - 12 hours (12 logs at 1 hour each)

for i in $(seq 1 12);do esxtop -a -b -d 30 -n 120 > /vmfs/volumes/VMFS-T2-LUN10/ESXTOP/esxtop-$(date +%m-%d-%H%M%S)-$(hostname)-1Hour-30s.csv;done

 Of course you can create endless variations on this to adjust it to your needs.

Sample output format

The filenames will appear something similar to the format below. You can see how it clearly shows the details of the capture in the naming format.

esxtop-04-07-004631-ESX01.dir.svc.accenture.com-15min-10s.csv

This is quite useful when you have a series of capture logs as the commands above will create.

Usage

It is often useful to capture at short intervals during the start of a problem but also sometimes required to run ESXTOP captures for longer periods of hours or days. With a few of these commands on hand for different levels of sampling detail, you can run both in parallel - for example a detailed capture for 10 second interval samples for 2 hours during the start of a problem and then a longer capture for the next 24 hours with less detail at 30 second interval samples. I often will start a short interval capture on one SSH session on a host and then in parallel a longer interval capture on another SSH session. The short one will capture the immediate issue and the longer one will capture any longer term issue that may show trends over a period of several hours.

Size of the ESXTOP capture logs

ESXTOP Capture logs will vary widely in size depending on the number of objects on the host. The only sure way to estimate the potential size of the log captures is to run one an esxtop capture as above on a host and then record the average size per sample and extrapolate from there. In a large environment the size of a single ESXTOP sample may be on the order of 4.8 MB. So for example, in this case in example #1, each log file would be 4.8 MB * 90 = 432 MB each and the total space used during the entire script would be 4.8 * 4320 = 20736 MB = 20.7 GB . Since these files are easily analyzed in Windows perfmon, a manageable size is in the range of 200 - 300 MB per log file, so some adjustments may be needed in that case. If desired, the output can be piped to gzip and compressed on the fly, but this will create some CPU overhead on the host.

Other considerations

If the ESXTOP captures are initiated from an SSH session, then they are susceptible to any timeouts or interruptions in the SSH session, so it may be desirable to run the scripts from a console session on the host. In this case it is still possible to run scripts in parallel by using the "&" delimiter to run two commands at once or even creating a bash file, but again careful consideration needs to go into this since the console will be tied up running these commands. The commands can be halted at any time by using a Break or CTRL+C.

CRON Jobs

These scripts can be combined with CRON jobs on ESX to run on a scheduled basis. There are VMWare KB articles on how this can be done.


No comments:

Post a Comment