Archive for Datacenter

Lamson

I’ll let you know!

Comments (1)

It’s magic! (rman knows the LIKE command)

Well maybe not exactly magic, but pretty nice nonetheless. In the previous entry I discussed managing the standby archivelogs (in the flashback recovery area). If you are running everything with a catalog database and have a primary and a standby, the primary and standby are pretty much identical except in the flashback area:

/path/to/flashback/archivelog/$DB_UNIQUE_NAME/data/files

so on the primary and standby, which have different $DB_UNIQUE_NAME, there is a directory unique to each machine. If you
ever need to clear out files on one or the other and don’t use RMAN to do it, this will make sure the catalog is accurate:

- use rman to connect to the primary and catalog
- RMAN> crosscheck archivelog all like ‘%$DB_UNIQUE_NAME%’;
Do the same on the standby…

This only updates the catalog for the online archivelog files. To update the backup files as well:

run {
allocate channel device type disk;
crosscheck backup;
}

Comments (1)

Using RMAN to restore archivelog gaps on Oracle 10G with physical standby

One of the databases I support is an Oracle 10G instance on Windows 2003 Server running in a physical standby configuration. One thing I still need to tweak is how archive logs get deleted on the standby machine. I can’t see how to get RMAN to address the archivelogs on the other machine. Well, the other day the archivelog destination drive filled up on the standby, so the primary stopped shipping logfiles. The sysadmin noticed this and cleared out some older logfiles, but we had an archivelog gap – the archivelogs that were older than the retention period on the primary were no longer online in the flashback recovery area.

Fortunately we keep disk-based RMAN backups online as long as possible so I knew they were still embedded in the backupset somewhere.

Here’s the secret recipe for getting them back out:
Look at your OEM and figure out which logfile sequence numbers are in the gap.
Login to your RMAN and do this:
RMAN> list backupset;

— snip snip —

List of Archived Logs in backup set 40885
Thrd Seq Low SCN Low Time Next SCN Next Time
—- ——- ———- ——— ———- ———
1 12870 132214214 31-OCT-05 132240966 31-OCT-05
1 12871 132240966 31-OCT-05 132291702 01-NOV-05
1 12872 132291702 01-NOV-05 132400078 01-NOV-05
1 12873 132400078 01-NOV-05 132504581 01-NOV-05
1 12874 132504581 01-NOV-05 132594119 01-NOV-05
1 12875 132594119 01-NOV-05 132683505 01-NOV-05

— snip snip —

I am just showing a few of the Sequence numbers – the output is quite long.
Let’s say I needed 12872-1874.

RMAN> run {
2> ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
3> restore archivelog sequence 12872;
4> restore archivelog sequence 12873;
5> restore archivelog sequence 12874;
}

You’ll see something like this:
allocated channel: c1
channel c1: sid=2054 devtype=DISK

Starting restore at 04-NOV-05

channel c1: starting archive log restore to default destination
channel c1: restoring archive log
archive log thread=1 sequence=12872
channel c1: restored backup piece 1
piece handle=G:\BACKUP\ARCHIVE\ARC_DORAMYSID_T573243355_S1283_P1 tag=TAG20051101T181555
channel c1: restore complete
Finished restore at 04-NOV-05

Repeated for each recovered sequence …

What’s really slick is the second they are recovered, the primary will ship the archivelogs and the standby will apply them. Recovery from being 3-4 days behind on a very busy instance only took about 10 minutes at which point the standby was fully in sync again.

If they weren’t in the online backups, I would have had to look in RMAN to figure out which tapeset they were in, restore the backups, and run a CROSSCHECK to let Oracle know they were back online.

Someday I’ll configure RMAN to use the tape farm directly.

Comments

It’s always something …

Just ran across a little thingie mabob that caused some head scratching. Was testing my new postfix / smtp-auth setup and discovered that if I telnet SERVERNAME 25 from within my network I get all the nice server responses after an EHLO myserver.mydomain.com … turns out if i do the same from outside I get ‘550 command not implemented’. After a little digging and testing various combinations it occured to me that the firewall was the likely culprit, and sure enough The Debian-ISP list had this thread about exactly the same problem. Its odd that a bug identified with ESTMP on Cisco PIX routers in January 2003 still exists in late 2005. Anyway the simple fix was to issue the following command to the PIX:
no fixup protocol smtp 25
and it worked.

Comments

Managing Logs

I’ve been doing some research on the best way to consolidate server logs so they actually get looked at (gasp!)

Since I’m far more linux than windows, it makes sense to use syslog. I understand that in windows you can consolidate event logs, but that’s not useful to me here. What’s nice is all the devices (router, firewalls, switches) will also log to syslog, so its a pretty
obvious thing to do.

For now I’ll just list the tools and approach I’m considering:
To enable any windows machines to dump the event logs to syslog: Snare Eventlog Agent for Windows.
As mentioned above, all my other devices already have syslog enabled.

So now that I can log everything to one place, what do I use for threshold / event monitoring?

The leaders in the race appear to be:

  • LogSurfer or LogSurfer+ (same site) – which grew out of the Swatch project.
  • Simple Event Coordinator, which is described quite thoroughly in this SEC tutorial/article and in a very thorough paper describing the operations at John Rouillard’s site. SEC seems very robust, but maybe overkill to configure, since I generally know if the machines are misbehaving. I’ll definitely at least play with it to see what its capable of.

Comments

Basic cacti from script to graph

Setting up cacti is a bit confusing at first. By using the method I did to configure snmp, it’s easy to see how to configure scripts to return data at certain MIB numbers.

The main thing to be clear on when setting up cacti is the sequence you go through to get a graph up.

  1. Create a device
  2. Create a Data Input Method
  3. Create a Data Source
  4. Create a Graph
  5. Add Graph to Graph View

For details, there is a good walkthrough at Simplest Method of Going from Script to Graph.

Comments

Enabling SNMP on Centos / RHEL

Quick method, immediately after install:

yum install net-snmp
yum install net-snmp-utils

snmpconf -g basic_setup

Pick reasonable values. I enable one of each kind of monitor so I have some examples, and only enable snmp 1 / 2c READONLY communities with a community string specific to my setup. I enable one rocommunity for localhost testing and then create another with the same community name reachable from my cacti server.
This ends up looking like this in /etc/snmp/snmpd.conf (use your settings)
rocommunity MYCOMMUNITY1234 cactiserver.localdomain.com
rocommunity MYCOMMUNITY1234 localhost

I use localhost rather than hostname so the config is portable across machines.

FOR those like me that are too lazy to walk to the console and use the GUI, otherwise enable snmp/UDP in the GUI.

iptables --insert RH-Firewall-1-INPUT 9 --protocol udp --dport 161 -j ACCEPT

note – rule 9 in the chain is about right if you have ssh and http enabled. Just want it to be before the final rule.
Do an iptables –list and count down to right before the final REJECT rule – use that number (one less than reject) instead of 9.

chkconfig snmpd on
service snmpd start

lsof -i UDP:snmp
#confirm its listening on port
snmpwalk -Os -c MYCOMMUNITY1234 -v 1 localhost system
#confirm its up locally
snmpwalk -On -c MYCOMMUNITY1234 -v 1 localhost prTable
#look at process monitor you setup during config, get MIB number
snmpwalk -On -c MYCOMMUNITY1234 -v 1 localhost dskTable
#look at disk monitor you setup during config, get MIB number

Etc. You can read the whole list in /usr/share/snmp/mibs/UCD-SNMP-MIB.txt in the section called ‘Current UCD core mib table entries’.

#Test again from cactiserver
snmpwalk -Os -c MYCOMMUNITY1234 -v 1 TARGETMACHINE system

It works? —

iptables-save > /etc/sysconfig/iptables

(or it won’t work after you reboot!)

DONE!

Comments

Great site for linux and network administrators

I just ran across this site Silicon Valley CCIE. They have 3 great online guides:

  • Linux Networking
  • Cisco Networking
  • Data Center Relocation

Worth a visit!

Comments

What time is it Mister Fox?

In some vague sense, I have been trying to build my datacenter with the principles over at Infrastructures.org. One of the principles is Time Synchronization.

I considered buying a network time device that uses GPS to provide a local stratum 1 server -gizmos such as Symmetricom or Spectracom product line.

After looking at their sites as well as several others and only finding links for ‘request quote’ for pricing, I decided to look for alternatives. I’m obsessive, but not so obsessive that I feel like getting marketing spam just for looking at gear.

University of Utah IT Department did a good writeup of their NTP architecture and how they intend to distribute time across campus.

A little too big an architecture for me – I just want to set the time on a few dozen servers and clients without having each of them go out to the net.

So, with a dedicated device purchase on hold because of arcane marketing practices, I happened to be setting up BGP on my router with the help of my friend Brian who runs Secure Network Designs. He set up a large-ish ISP called Airnetlink, which was doing wireless T1 sales in office parks. Since he had multiple T3s that were multi-homed (and he’s an old friend) he was able to quickly set me up on the BGP side, and I noticed in the Cisco config script the following:

ntp clock-period 17180547
ntp master 6
ntp server 192.5.41.41

So, I did a little digging, and discovered that my router can poll data from the US Naval Observatory (most people’s choice for an NTP server) – there are two, tick and tock. Also, the router can do NTP broadcast into my network, which eliminates the need for each machine to poll.

So I will just be setting the cisco to broadcast ntp into my interior and configure NTPD to listen in broadcast mode.

Comments

Yum repositories

An important aspect of maintaining the systems will be to keep current copies of the software I use. I have settled on using yum for my package maintenance and upgrades, and a few packages I like aren’t in the default centos distribution.

To add my favorites, I am using Dag Wieers’ Repository by doing the following (on centos 4.2):

create a file called /etc/yum.repos.d/dag.repo with the following:
[dag]
name=Dag RPM Repository for Red Hat Enterprise Linux
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag
gpgcheck=1
enabled=1

and then import the gpg key like this:
rpm –import http://dag.wieers.com/packages/RPM-GPG-KEY.dag.txt

After that I was able to just run
yum install cacti
and had all I needed to run cacti. PHP-SNMP, Mysql, RRDtool etc.

Very cool!

Comments (2)

« Previous entries Next Page » Next Page »