Friday, January 30, 2009

Upgrade from IDS 9.4.FC6X3 to 11.10.FC2

John,

A couple of questions:
· How much space is available on the new server? Could you do a restore and then an in-place reorg (move a table from current dbspaces in to a new large dbspace. Repeat this for most (if not all) tables one at a time (could be a script so that it is automated)
· Using the current method, did you remove the index definitions on the new database and then recreate them once the load is complete? This speeds loading up a lot
· Do you have transaction logging on, on the new database server? If “yes” the issue a “lock table” command just before the load command and a “unlock table” after the load.
The only other method I have used before on a large database like this was to use FIFO queue and transferring over the network in parallel. This does require an extremely fast communications network (Gigabit preferred). I don’t know if it will help you or if it will give better results in your case, but let me know if you want to try it. The reasoning behind this is that a standard “load” or “dbimport” takes place in serial. If the machine still have lots of resources available while the import is running, you can double up the effort (and half the time) by running 2 jobs simultaneously. I have found on the previous run I did, that the systems could handle 4 jobs simultaneously with the best results. Perhaps you can split your “load” job in 4 as well and kick off the 4 jobs at the same time to load the data? If you need to discuss this with me, please give me a call.

Regards,JV
IBM Information Management Lab Services - Service Excellence Award Winner - 2006
-------------------------------------------------------------------------------------------------
Hello all. I am on-site and we have been attempting to upgrade the client from IDS 9.4.FC6X3 to 11.10.FC2. The reasoning for 11.10 was that when we started this project over a year ago, that was the most stable of the 11.x series and that was what was tested with the client's apps. I realize now that 11.5x is the latest and probably the version to go to. With that aside, let me explain how we are doing things and see if anyone has any suggestions for improving the process.

We have set up an HDR server for the 9.4 instance called the swing server. The purpose for this is to be able to create replicates for the large tables, suspend HDR, start and suspend the replicate, unload the the table from the HDR server, resume HDR, load the table in to the 11.10 instance, resume the replicate and perform a level 0 backup on the 11 instance. This allows us to get a majority of the large tables (by large, I am talking several tables over 100 million rows with some going to over 500 million rows) migrated before the actual weekend of complete migration. Even with this pre-migration work being done, it is still taking 40 hours to complete the migration because of several issues which include:

Some tables are not keeping synched up so they need to be re-done or are held off until migration weekend.
HPL issues with 9.4.FC6 that were unknown were causing data to be unloaded in the wrong columns.
Some of the tables take several hours to unload, copy flat files to the new server, reload and create indexes. (one large table has 44 indexes!!!)
The lack of horsepower on the old server does not allow us to do too many HPL jobs at once.


The reasoning for our method is that the old instance now has over 650 chunks spread out over many disks with some tables having over 100 extents. We dont want snap this configuration to the new server and do an inplace upgrade as that will defeat our purpose of the stabilization of the database. This db is nearly 2TB in size.

I have come up with a couple of alternate methods but have not put them down on paper yet. I am asking this group if there is any other way they could think of to do this re-org. One of the methods I was considering is to do an in place upgrade to 10.x on the current box, install 11.5x on the new box and set up the replicates with the synching mechanism that the two newer engines provide. They may not go for this because their acceptance testing policies are very strict. They are basically asking me to come up with any other way that will not require over 40 hours of downtime.

For the record, our third attempt at this was succesful, but around 18 hours into the new box being in production, the engine crashed due to an unpublished bug that is associated with DD_HASHSIZE and DD_HASHMAX being set at the defaults. This caused unrecoverable table and dbsapce corruption that would have required more downtime to fix than the customer would have allowed requriing us to roll back yet again.

Thanks in advance to any time and thought you can put into this.

JOHN

No comments:

Post a Comment

US-CERT Technical Cyber Security Alerts