At work I hold responsibility for an ageing Broadworks VoIP telephony platform that provides service for 2 of our offices, for our home and on-call engineers, and for a small grouping of customers. The platform runs on a Sun Solaris architecture although the software is now getting very old and we have no plans to update since there are plans afoot to replace the entire platform to bring us into the corporate telephony system and to move the customers over to a new Broadworks platform.
This morning a number of our internal users reported a problem trying to make updates via the GUI with the following pertinent error message buried within a long list of errors:
Data store space exhausted
Initially I checked all the Broadworks servers to look for space issues but none were found, which matched with the fact that none of our monitoring servers reported a disk space issue. The next step was to SSH into the application servers (as1 & as2) where the following error was immediately reported by Broadworks:
TimesTen temporary memory area is at 95% of total temporary size. (Currently in use size is at 95%. Allocated size is 16384, high water mark is 16062 and in use size is 15667.)
Increase your datastore temporary size area (using the resizeDSN tool)
This gave some more useful information so it was time to use some Google-fu and find some answers. Thankfully someone had experienced the very same problem very recently and had included a guide on his blog. So hats off to Mark Holloway for posting his entry Resizing the Broadworks Datastore (DSN). The rest of my guide is based on the article published by Mark along with some of the issues we experienced on the way.
The first step is to check the amount of available memory on the as1 & as2 servers. Our servers run Solaris so the following command was suitable for us:
bash-2.05$ prtconf | grep Mem
Memory size: 2048 Megabytes
The guidelines seem to suggest that the perm size shoud equal approx 25% of the physical memory and the temp size should equal approx 25% of the perm size allocation. We noted that nothing so far really revealed what the allocations were but we proceeded anyway but then found at step 7 that the system will show you the current settings before asking for the new values. In our case, after revewing the current memory allocation, we decided to leave the perm size alone and just slightly increase the temp size.
This is the type of output you will see at step 7 to give you an idea of how to check the current allocation and what the change request will look like:
——————–
Current Date:
Current Perm Size: 128
Current Temp Size: 16
Current Total Size: 144
——————–Select the new database Perm size….
Available Perm sizes in MB (64 128 256 512 1024 1536 2048) [144Current Date:
Current Perm Size: 128
Current Temp Size: 16
Current Total Size: 144
Target Perm Size: 128
Target Temp Size: 32
Target Total Size: 176
--------------------Do you wish to proceed (y/n) [y]?
Below are the steps required:
1. SSH to as1 as bwadmin
2. stopbw
3. repctl stop
4. su as root
5. cd /usr/local/broadworks/bw_base/bin
6. ./timesten.pl unload
7. ./resizeDSN
8. exit (return to bwadmin)
9. repctl start
10. startbw
We found the Broadworks would not start properly straight away afterwards had 2 reported issues. The first related to the ‘Execution Server’:
——————————–
System Health Report Page
BroadWorks Server Name: as1
Date and time : Wed Oct 28 10:22:25 GMT 2009
Report severity : CRITICAL
Server type : AppServer
Server state : Unlock
——————————–BroadWorks AppServer processes in trouble:
Execution Server not running
——————————–
Recommendations
—————The Application Server needs to be restarted
——————————–
However, while trying to look around the system Broadworks generated abroadcast message to state that a start had been initiated:
bwadmin@as1$ Broadcast Message from bworks (console) on as1 Wed Oct 28 10:45:33…
===== BROADWORKS CONTROL — START INITIATED =====The error message then changed to:
——————————–
System Health Report Page
BroadWorks Server Name: as1
Date and time : Wed Oct 28 11:00:33 GMT 2009
Report severity : CRITICAL
Server type : AppServer
Server state : Unlock
——————————–
Replication is not running for DSN AppServer. Databases may be out-of-synch.File replication is not running.
——————————–
Recommendations
—————Replication must be started (repctl start). If databases are out-of-synch they must be re-synchronized first (with the
importdb.pl tool). Please refer to the BroadWorks Maintenance Guide for detailed procedures.Perform a file replication restart (repctl restart)
——————————–
We tried to restart replication as stated, which appeared to work, but then the same error would appear again. At this point we started to raise a support ticket with Broadsoft but by magic the error vanished and the system began to respond without any errors. It seems we had rushed through the changes so quickly that we had not allowed all the systems to start correctly and it was just a case of learning some patience. If in doubt just slow down and use the healthmon command to check on the status:
healthmon -l
The blog article then advises to wait 10 minutes before moving onto as2 so we popped down to the vending machine to shoot the breeze and catch up on the gossip.
Here are the steps we then used on as2:
1. SSH to as2 as bwadmin
2. stopbw
3. repctl stop
4. su as root
5. cd /usr/local/broadworks/bw_base/bin
6. ./timesten.pl unload
7. ./resizeDSN
8. exit (return to bwadmin)
9. importdb.pl AppServer as1 AppServer (replace as1 with your primary AS hostname or IP)
10. repctl start
11. startbw
We found that step 9 came with a big bag of fail attached so had to backup the database on as1, copy across to as2, and then manually import onto as2:
1. On as1: bwBackup.pl AppServer dbBackup.db
2. scp the file to as2: scp dbBackup.db bwadmin@as2:dbBackup.db
3. On as2: stopbw
4. repctl stop
5. bwRestore.pl AppServer dbBackup.db
6. repctl start
7. startbw
That dealt with our problem and our 2 servers were once again fully operational with the Helpdesk busy dealing with requests to make changes on the system.
I realise that I am just standing on the shouler of giants and without the original posting I would probably still be busy dealing with the Broadsoft support team at the moment (who are generally excellent btw).