Welcome, Guest. Please Login or Register
Telcom Pro's
 
  HomeHelpSearchLoginRegister Nortel MetaSwitch  
 
 
Page Index Toggle Pages: 1
Send Topic Print
ofCPU Usage (Read 497 times)
Oct 15th, 2008 at 6:58pm

George   Offline
Junior Member
Translations Technician
I Love Telcom!!

Posts: 6
**
 
I have a DMS that sometimes its CPU goes up to 98% for no aparent reasons.  Some days with the same call volume it only goes up to 60% so I wonder, is there a way for me to know what's eating up the CPU?
 
IP Logged
 
Reply #1 - Oct 15th, 2008 at 8:19pm

Paul   Offline
Site Administrator
Switch Technician
Southern Maine

Posts: 238
*****
 
In the past killer trunk loops had caused that one one of my switch's.  The switch was out pulsing a number ported to that very switch, this was causing a round-robin trunk issue.

That's the first thing that comes to mind.
 
IP Logged
 
Reply #2 - Oct 16th, 2008 at 10:58am

mountain   Offline
Full Member
NOC Technician
Denver, CO USA

Posts: 170
***
 
What are you using to determine the CPU is going to 98%?  The utility HIGHCAP in DMSMON provides a high water mark on an hour by hour basis.  Because it is a high water mark, it will display millisecond spikes which may not be relevant.  While HIGHCAP is useful for a quick glance, it's output can be misleading.  CPU usage is not only a function of call volume, but also of call compexity.  For instance, a centrex line originating a call takes more CPU power than a POTS/CLASS line.  If your call volume remains the same during those hours of high CPU usage, then it is just a spike. 

The problem could be an LNP issue.  We have those periodically and they can create a looping scenario until the SS7 hop counter reaches it's maximum.  If a number is ported out of your switch, but the local LEC has not updated their LNP database (or has it wrong) and someone in your DMS dials that number, then the DMS will send the call out to the local LEC who sends it back to you.  You in turn ship it back the the LEC and they send it back, etc.  On and on until the HOP counter expires.  Of course, when someone can not reach a number, they try again and the loop happens again.

If it is an LNP looping problem, LNP303 logs might give you an indication of the problem DN being dialed.  Else if you have an external SS7 call tracking device (i.e. INET) which captures calls over your SS7 network then you might be able to see call to one DN being passed from your DMS to the local LEC and back.

It may not be just one DN, but a pool of DNs which have been ported out. and being looped.
 
IP Logged
 
Reply #3 - Oct 17th, 2008 at 8:04pm

George   Offline
Junior Member
Translations Technician
I Love Telcom!!

Posts: 6
**
 
We are using the highcap in DMSMON and as you said it may be just a millisecond spike so, thinking about it...what would be the best way to know the CPU usage?
I've been checking for looping and haven't found any as of yet.
 
IP Logged
 
Reply #4 - Oct 21st, 2008 at 10:48am

mountain   Offline
Full Member
NOC Technician
Denver, CO USA

Posts: 170
***
 
The OM BRSTAT provides output for CPU utilization (CPOCC).  The register BRCAP shows an AVERAGE CPU usage over the time collection period.  While an average CPU usage doesn't sound useful, you should be able to see if the high CPOCC shown in HIGHCAP from DMSMON is realistic.

You could send BRSTAT to an SSR log.  Using SSR you do not have to involve the OM collection people and you don't interfere with any OM classes, they may be using.

SSR logs are set up through table SSRDEF and SSRFORM.  SSR600 is already defined in both tables, and you can make your own output by defining SSR601, SR602 etc.  I would use an interval of T30 (every thirty minutes) for the log since the interval of an hour, daily, etc. may result in the BRCAP field being added.  If the office parameter OMXFR is set to 30 minutes (which it usually is), then an hour collection time would result in two 30 minutes being added together.  Thus if one 30 minute BRCAP was 50% and the second BRCAP was 40%, the hour collection would show 90 since it added the 50 and 40 together.

If you need help decifering table SSRFORM, let me know and I can give the datafill.
 
IP Logged
 
Reply #5 - Aug 18th, 2009 at 7:04am

mountain   Offline
Full Member
NOC Technician
Denver, CO USA

Posts: 170
***
 
Something from the Nortel Web site regarding spikes in CPU usage.  I was looking for something else and found it.

CPU Occupancy spiking to 120% intermittently.

--------------------------------------------------------------------------------


Problem Description
The site observed that the CPU occupancy was unexpectedly spiking to 120% at various times throughout the day.

>highcap
*************************************
*       HIGH WATER CAPACITY         *
*************************************
TIME
DATE  |  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 00
-------------------------------------------------------------------------------
03/07 | 27 15 12  8  8 10 23 32 120 48 54 79 122

Additionally, the site was seeing the following logs generated during the timeframes when the CPU spiked:

CM         *  DDM102 MAR07 08:41:38 3293 FLT
       Update Distributed Data Failed
       Failed To Update Table XLIST ROUTER DATA    (Table ID = 131)
       Reason = no reply from pm                  
       LIU7      47

CM            CCS701 MAR07 12:58:57 4882 INFO MTP Static Audit
       LIU7 47 - CCS7
       PROBLEM: Data Mismatch in PM                   
       TABLE: MTS_DMSX
       TUPLE: XPM_DS30_ROUTE_INFO                   
       FIELD:    34                                 
       ACTION: Problem Corrected by Audit      
Problem Resolution
Per DDM102 and CCS701 logs,  LIU7 47 was common to all occurrences.  The customer changed hardware for LIU7 47
using NTP 411-3001-547 CDMA Base/Telecom Trouble Locating and Clearing Procedures (all cards - front and back).

Explanation:
=========
LIU7 47 experienced a hardware failure resulting in false congestion indications being presented on STP link 6 and the link was removed from service at varying intervals.   This link is part of a combined linkset for STP messaging.    As such, each time the link went out of service, or indicated congestion, it impacted multiple Routesets on the MSC.

DTCs and SPMs keep track of routeset management information via an internal table.  This internal table is updated by the core, so each time the link experienced "an event" (triggered by hardware failure), all PMs had to be updated for each of the 95 routesets.  This is one of the contributing factors to the CPU spike seen on site.

The other factor is packet retransmission.  As this link was faulty, the message sequences that used it would in some cases have to be completely repeated.  This lasted only for a short time frame, but was enough to trigger high CPU utilization values.
 
IP Logged
 
Page Index Toggle Pages: 1
Send Topic Print