Laying Down The Hammer
One of the exciting things about my day job is that I get to work in a rather large environment by most standards. I consult for a Fortune 500 company that has around 5500 Cisco UCCE and CVP Agents in a dozen counties and another 10,000 normal Cisco IP Phone users in another 24 locations around the globe. What this does on a daily basis is present us with interesting and often never before seen problems to tackle.
Last week’s problem involved load testing our new UCS B-Series virtualized platform set to go live in April. Now, 5500 agents can produce one hell of a load, not the load your typical call center sees, we have four (4) Voice DS3 between our 2 main facilities. That’s 672 channels each, or about 28 T1’s. We do nearly 1 million minutes a month between these circuits, and almost 2 million a month on a global scale, pretty awesome.
So when we talk about this much traffic, it’s important to performance test any new hardware before going live. The performance test tool used to do this is generally called a ‘hammer’. We typically hire a company for this, there are a few out there I will leave nameless, but in general it can cost as much as $50-100k for 8 hours of testing decent load for what we need. Not something you get a lot of shots at, so the goal is to try and get it to work the first time. The main points of focus for our first performance test are testing CVP 9.x, UCCE 9.x, AS5400 ingress gateways (DS3 terminators), and new 3945E VXML gateways (~800 VXML sessions each).
2 weekends ago we got on our first call with the vendor, they fired up the hammer, pumped in 200 calls to start…and it died after about 130. Our 3945E locked up in a datacenter in the frigid north at around 50 VXML sessions. To make this more awesome, no one is on site, it didn’t crash so it just hung, and there are no logs. Finally after 90 minutes someone gets on site reboots it, we consider it may have been our debugs and we fix that. We run it again, and it dies again. 2 hours and $15k later, we have literally accomplished nothing, and we follow up with TAC.
TAC can’t find anything because there is no logs, and well, you need to get lucky to have a TAC engineer who knows their stuff when talking about call center related problems. They want us to run the test again with them on the horn, which costs us a mini fortune. We are running 15.2(4)M5 on the 3945E and during the test we realize our lab 2921 took a bunch of these calls (accidentally) and it hit 100 VXML sessions on 15.2(4)M3. We have a hunch we are running a bad IOS, but TAC can’t confirm. We need a cheap retest to figure all this out.
In comes interesting solution of the day. We don’t want to pay another $15k, but we need a ton of calls, these are the problems I live for. There are a few options to generate a ton of traffic that I came across and a few ideas I had myself. First off, SIPp is a sip traffic generator, the problem for us is we need to test the DS3, this guy is out. Now we are almost immediately into building our own custom software, so that’s what we will do.
Solution
Our options are, TCL on the router, used to generate a ton of calls. I am pretty good at writing TCL, but it’s hard to debug, and I am pretty sure this make me start crashing more routers, last choice here. Next build a small CTI app and have it control some ports on CUCM like the outbound dialer does, tons of work not going happen in a few hours. Next we consider using the Cisco Outbound Dialer we already have installed in production. We only have 50 ports however and need hundreds of calls. Furthermore it’s an awful product, and I would literally do anything to avoid using it, so I am back to TCL or CTI.
My last option is something I have never used before, its called Twilio (www.twilio.com) Twilio is an API driven in the cloud, completely programmable ‘phone system’. They provide a host of REST API’s where you can build, outbound campaigns, custom IVR’s, voicemail systems, SMS platforms and all with API’s. I take a quick look at Twilio, their pricing (2 cents per minute) and their features and decide this is the obvious option. Twilio provides ruby, C#, Java, PHP and many other languages libraries for easy interfacing. I am a total ruby buff so this is an easy call for me.
This is the part where I preface if you don’t know what your doing, make sure you DO NOT pump 350 calls into your system. There is a good chance you will destroy something. START WITH A LAB SYSTEM.
Before we build what do we need?
- An application that can put a ton of calls into an IVR (unlimited?!?)
- The call is answered by the IVR and put into an infinite MOH loop, since no agents are online.
- The hammer notices it has been on, and generated some sound to keep the call up as well as test quality.
- It needs to get done quick.
Alright, well good news Twilio gives us some ruby code here after we sign up to easily generate a single call…. https://www.twilio.com/user/account/developer-tools/api-explorer/call-create
require 'rubygems' # not necessary with ruby 1.9 but included for completeness require 'twilio-ruby' # put your own credentials here account_sid = 'Ibnsn034gnvuh9HierubviubIUGIUHH' auth_token = '[AuthToken]' # set up a client to talk to the Twilio REST API @client = Twilio::REST::Client.new account_sid, auth_token @client.account.calls.create({ :from => '+11234567890', :method => 'GET', :fallback_method => 'GET', :status_callback_method => 'GET', :record => 'false' })
This is pretty simple, load the library, create twilio client with associated security token’s and generate a call. One crucial thing missing here is that you need some more fields to make this work.
One is the URL of the TwiML file. It basically a custom made XML format by Twilio, but is very close to VXML. We are going to build a small XML file and toss it on Dropbox. https://dl.dropboxusercontent.com/u/56846391/playRecording.xml
The file contains the content below which is pretty simple. It’s saying play this wav file X times. In our case this cowbell.mp3 is 52 seconds long, and we are playing it 6 times. This means the call is up about 5 minutes. This is important because Twilio is not made for a hammer, so it takes a while for hundreds / thousands of calls to start. So sometimes you need to keep them up awhile to get all the calls you want into the system. We got about 350 into our system using the configuration I am showing you below.
<?xml version="1.0" encoding="UTF-8"?> <Response> <Play loop="6">https://api.twilio.com/cowbell.mp3</Play> </Response>
The next parameter is the to_number, which is the number you want the call delivered to. The last important step is to generate more than one call, we will do this by tossing it in a simple for loop. Final code for generating 350 calls is shown below.
require 'rubygems' # not necessary with ruby 1.9 but included for completeness require 'twilio-ruby' # put your own credentials here account_sid = 'IUG8b8kG9ig87g1&GibIUHiuGIvb' auth_token = '9ubi8ybuHBuyguVugfytfJg7h8i89' # set up a client to talk to the Twilio REST API @client = Twilio::REST::Client.new account_sid, auth_token for i in 0..350 @client.account.calls.create({ :from => '+11234567890', :to => '+19876543210', :url => 'https://dl.dropboxusercontent.com/u/56846391/playRecording.xml', :method => 'GET', :fallback_method => 'GET', :status_callback_method => 'GET', :record => 'false' }) end
Verification
In this case we are testing the VXML gateway, since it was our failure component. We log into the 3945E and see what it looks like with no calls using the ‘show voip rtp conn’ command.
VXML3945RTR#sh voip rtp conn VoIP RTP Port Usage Information: Max Ports Available: 8091, Ports Reserved: 101, Ports in Use: 0 Port range not configured, Min: 16384, Max: 32767 Ports Ports Ports Media-Address Range Available Reserved In-use Default Address-Range 8091 101 0 No active connections found
Now we will take a look at it with our Hammer generating 4 calls.
VXML3945RTR#sh voip rtp conn VoIP RTP Port Usage Information: Max Ports Available: 8091, Ports Reserved: 101, Ports in Use: 4 Port range not configured, Min: 16384, Max: 32767 Ports Ports Ports Media-Address Range Available Reserved In-use Default Address-Range 8091 101 4 VoIP RTP active connections : No. CallId dstCallId LocalRTP RmtRTP LocalIP RemoteIP 1 2956 2962 17566 21308 10.180.153.3 10.180.153.1 2 2959 2964 17568 22254 10.180.153.3 10.180.153.1 3 2966 2969 17570 18582 10.180.153.3 10.180.153.1 4 2971 -1 17572 20810 10.180.153.3 10.180.153.1 Found 4 active RTP connections
Conclusion
Twilio built a great quick and dirty hammer for generating some traffic. The 350 calls for 5 minutes cost us about 6 dollars, as opposed to the 7500 for one hour, a traditional vendor would have cost. It took us about 3 hours from start to finish not knowing anything before hand and just taking shots in the dark.
It also proved that 15.2(4)M5 is the culprit, as it did crash again with this hammer. We loaded up 15.2(4)M3 and it ran like a champion up to 350 calls. Thanks for all the help TAC.
There are a ton of improvements in the tool for using it on a regular basis that should be considered before using this in any sort of intense situation. Here are some of the things anyone should consider long-term when building a hammer.
- Analytics – people want to know how many calls failed, were busy, which numbers these were, call durations, etc. All this is super easy with Twilio’s API, but you have to build all the rest after you get the data.
- Call Termination – There is seriously no worse situation with Twilio then having generated hundreds of never ending calls and having no way to kill them. Twilio has an API call to do this per call, but you need a serious tracking system for this and analytics on a large scale. I built this before I decided to potentially destroy my clients datacenter routers
- IVR navigation – most clients like us have deep and complex IVR’s and also want to test self service etc. Twilio does have functionality for IVR navigation, but it is its own ballgame.
- Test Cases – Most clients don’t want you to call one number 350 times, they want you to call a smattering of numbers randomly with random input.
Thanks to Josh Kittle (@ciscovoicedude on twitter)for spurring me to write this. I hope you enjoyed it.
Chad Stachowicz
cstachowicz@cloverhound.com
Twitter: https://www.twitter.com/chadstachowicz
LinkedIn: http://www.linkedin.com/pub/chad-stachowicz/1/981/a6
7 responses to “Laying Down The Hammer”
We are running the exact same test with the exact same setup and failing like a champ also and don’t want to pay the crazy money to these hammer “experts”. We are running 15.2(4)M4 code on our 3945E VXMLs…. why did you go back two code revisions from M5 to M3 (is M4 bad)? Thanks and awesome article!
Hi Tom,
As I said in the articled we had another gateway taking calls (2921) and it took like 175 successfully on M3, way more than our 3945E did with M5. I am pretty sure both M4 and M5 are bad for VXML GW. M3 will fix you up! (also happy to help you run the test with my advanced tools very affordable if you need)
Cheers,
Chad
Awesome Write-up!!!
DEFI DEVELOPMENT
[…] an incredibly expensive, and exacting operation. You need to measure twice and cut once. As he said in his initial blog post “in general it can cost as much as $50-100k for 8 hours of testing decent load for what we […]
[…] Laying Down The Hammer […]
Hello, ϳust wantеd tо mention, I loved this blog post. It was helpful.
Ҟeep on posting!
Nice post
Thanks & Regards,
From Call Center Software.