PDA

View Full Version : Question to those of you in large data centers


fildien
08-24-2005, 09:41 AM
We have begun discussions here as to how to better implement our backup and DR strategy and I'm curious how some of you do it. Here is a brief synop of how we do things.....

Roughly 250 servers (NT, UNIX, VMS). Backups begin at 5pm, we use a scheduler and an enterprise backup application....as backups finish and the Tape Library has free drives we begin doing what is called a Tapecopy. In other words we copy all of the backup data to as few tapes as possible. Once all tapecopies are done we vault the those tapes and ship them offsite for (x) number of days.

Obviously one can see flaws in this system but our biggest one is the fact that we are a hospital and we don't PURGE anything. That's right if you visit us for an MRI today and then die 7 years, 10, 15, years from now we will still have your record. So considering we are 2 hospitals and 84 remote sites large or roughly our catchment area covers a couple million people we are growing exponentially which makes our backups grow which makes our backup window shrink. We are only doing about a 1TB of data a night across 4 non-routed gig networks so all backups are finished no later than 4am. Some sooner but due to processing some can't run until later thus the late stop time. Our tapecopies run simultaneously as backups and we are generally done before 10am which is when our offsite storage guy comes but some days we are pushing noon or later.

So...we have consultants in b/c the buzz word is DR! And well we don't really have much of a DR plan so low and behold my group got a rather large budget increase to implement one. Here is one plan we are considering:

Backups will go to a 3.5TB ATA disk array and keep them there for no more than one day. When all backups are done they will be copied to our new DR site (which is really the bldg where the rest of IT is located across town), it will travel across fiber (I think we dedicating 2 pairs) and the tape library there will be LTO3, we currently are using an LTO1 TL.

So how do the rest of you do it?

Disk ---- Disk ----- Tape

Disk ----- Disk ----- Disk ---- Tape

Disk ----- Tape ---- offsite

What backup software do you use? What challenges do you face if any with scheduling, cost, etc.

Sadly being the worker bee that I am, I get to listen to the meetings ask questions but I ultimately do not get to say yes or no I just have to make the shit work.

Revellie
08-24-2005, 10:03 AM
here is the land of the internet we do it like this.

Primary server runs backups- nightly a SQL backup is created using SQL liteSpeed. that is spun to tape along with the 2 nights before.

In addition, we have 2 warm standbys, 1 at our primary hosting center one at our DR.

The data is also replicated to our home office which then backs the data up again and spins that to tape nightly which is stored at a seperate off site storage facilty than the backups of hte primary are stored at.

on average we backup 300 gig of compressed data a night.

Rev

fildien
08-24-2005, 10:29 AM
Forgive my ignorance as I'm not an NT person or SQL person. But what is SQL liteSpeed and is this used only for SQL databases? I work in a mixed enviornment NT/VMS/UNIX only a few of my servers have SQL on them and b/c our backup manager is a Unix box we don't even use SQL agents but instead do things like Flashcopies/Snaps after we quiessce(sp) the database and then back that up from disk to tape.

Malse
08-24-2005, 11:07 AM
You're backing up the entirety of everything every night? Easy solution is to only make a complete copy once evey period and take incrementals of only the new or modified data till the next week/month/year. Divide the system into logical groups and stagger the day of the period the full backup is done on.


Our offsite disaster recovery system is fairly simple. We have a Network Appliance filer with a couple TB of disk and bimonthly snapshots that everything gets stored on. In theory we could lose our entire building and have all our customer project development up to the night before only a slow NFS mount away.

fildien
08-24-2005, 01:14 PM
Aye Malse your DR solution is similar to our long term goal. When I first came on board here 3.5years ago this place was doing full backups of EVERY node nightly to a different tape and sending them offsite daily. It was......insane.

The TL solution was huge in terms of Business Impact but it's not cutting enough mustard. In fact we do have several nodes that have differential backups done on them and some only run weekly and some only run monthly....some of the dailys are simply just database dumps that happened earlier in the night. We have some systems b/c of their nature that require us to have multiple copies of the backed up data...damn HIPPA. So it's really not as cut and dry as my post indicates but I was curious if anyone employs a "disk staging" type solution where you go D2D2T (disk to disk to tape). I'm thinking this may only apply to healthcare organizations (that's all I've ever worked at aside from the Army and that doesnt count).

So while we have 250 nodes, with about 180 being Windows boxes only a handful of those servers are backed up nightly (Fileservers, Exchange, etc) the rest are just Citrix servers or web servers who we back up weekly. Our "weekly" backups span Monday to Friday giving us Saturday and Sunday for our monthlys and downtime.

Thanks for the feedback from everyone so far, if nothing else I see how overly complicated the company I work for likes to make things =\

mirdorr
08-24-2005, 04:55 PM
Big backups mean big cash. People don't go to disk anymore. They buy faster and faster tape drives (databases being an exception).

disk--tape--offsite, or in the case of a database disk--disk--tape--offsite would be the norm. Netbackup is popular in large scale environments.

Malse
08-24-2005, 07:55 PM
Big backups mean big cash.

Amanda and Bacula are both free, and debateably work better than Netbackup and Veritas within their operational domains. SDLT320 drives are only a few grand. You can get a fairly decent backup system for under $10,000 unless you have truly huge volumes of data.


So it's really not as cut and dry as my post indicates but I was curious if anyone employs a "disk staging" type solution where you go D2D2T (disk to disk to tape). I'm thinking this may only apply to healthcare organizations (that's all I've ever worked at aside from the Army and that doesnt count).

By D2D2T you mean a full copy of live data is made to a second system and that is copied to tape but not replaced? Using disk as a cache prior to tape storage isn't too too uncommon, we do that for our SDLT units so we can finish the local backups in less time than it takes to write it all to single tapes.

fildien
08-25-2005, 06:54 AM
Well money isn't necessarily the only denominator here for us....it's speed and reliability. disk is insanely cheap, we can get that 3.5TB ATA array for under 20k that's chump change compared to the IBM3585 LTO3 with new tapes, new drives, and an expansion slot to bump up to 693 slots. Well actually we could probably buy a few of those ATA arrays before it equaled one of our servers or that TL setup ;) We have beasts here, and I swear I see the VMS guys feed the hamsters in the alphas daily.

We do currently run backups from disk to tape and then offsite, and yes even some dBs dump to disk first and then we go to tape.... it's still god awful slow for us. We do have some of our NT systems going to SDLT drives....things that aren't supported or cannot have agents loaded on them like FDA proclaimed "medical" devices. They work good, but we ar need a central location for all backups and with minimal human interaction. The great thing about the huge fridge TL is that tapes are only touched by human hands once a day to collect from the mailslot, hand to the offsite vendor, and then import the tapes returning.

Our single largest backup is close to 300GB and that gets backed up nightly, it's a dumped area of a hot oracle backup and on a non-routed gig network I can shove it to tape in just about 4hrs. I know with an LTO3 I could easily cut that time in half but going to disk would take even less time.

I think the ops manager and my manager see that the disk area would be a staging area of sorts...all backups would be done by (x) time and then the DR copies would be done throughout the day. Since the TL would be offsite there wouldn't be a need for human interaction with vaulting tapes it'd all just go to and stay in the TL in mediapools.