So you just installed a new VMware ESX server. You tried to SSH to it and login as root. What happened?
It didn’t work, did it?
The firewall allows it, right? (yes) You can login to the physical server console with the same username & password, right? (yes) But it still doesn’t work, does it?
Let’s find out how to fix it….
To allow the root user to login to a VMware ESX Server over the network using SSH, do the following:
1. Go to the service console on the physical server & login 
2. vi /etc/ssh/sshd_config 
3. Change the line that says PermitRootLogin from “no” to “yes” 
4. do service sshd restart 
And your problem is solved…

OR from console run this:
 
mv /etc/ssh/sshd_config /etc/ssh/sshd_config.orig
cat /etc/ssh/sshd_config.orig | sed 's/PermitRootLogin no/PermitRootLogin yes/g' > /etc/ssh/sshd_config
service sshd restart

However, having said the above, it is not a good security practise to allow direct root level login over the network even if its using SSH. I prefer to add a regular users, SSH to the server using that account and then SU - to get to root.

Another recommendation is to use one non-root group for VM admins and add operator/admin users there. To create that group, enter the following command: 
groupadd -g 7777 vmadmins 

To create an account for the new admins, enter the following commands: 

useradd -c "ESX server operator" ESXOps 
Create a single userid, which will be able to operate all of the VMs. 

useradd -g 7777 johndoe 
Create a userid, and make groupid 7777 (vmadmins) as its primary group. 

useradd -g 7777 -c "Joe Blog" joeblog2 
Create a userid, and make groupid 7777 (vmadmins) as its primary group. 

One fine day the SAP system on our development server did not come back after an overnight offline backup. The guys tried a lot but it wouldn't start. Trying to start it from the command line told us that the mfc71u.dll was not present in C:\WINDOWS\system32. We restored the file from last nights backup and the SAP system started perfectly.


Now on a regular PC generally nobody goes in C:\WINDOWS\system32 and deletes files on a fancy and this was a heavily restricted SAP server with only the SAP and the OPS team having access. System logs didn't not indicate any unwanted activity. The only activity was the installation of Symantec EndPoint Protection Manager client, but this was a week back. Also this deployment had been tested on 5 test server previously without any issue.

The server's application event log indicated that Symantec EPM installed successfully. Still I got the boys to check the log created by Symantec EPM. This is what we found in the installation log. 

Info 1603.The file C:\WINDOWS\system32\mfc71u.dll is being held in use by the following process: Name: sapstartsrv, ID: 10572, Window Title: (not determined yet).  Close that application and retry.
MSI (s) (58:10) [15:31:12:215]: Note: 1: 2727 2:  
...
MSI (c) (C8:88) [15:31:12:215]: No window with title could be found for FilesInUse
MSI (s) (58:10) [15:31:12:215]: Doing action: uExtBeginUninstallImmediate.6500F9C2_37EA_4F25_A4DE_6211026D9C01
Action ended 15:31:12: InstallValidate. Return value 1.
MSI (s) (58:28) [15:31:12:231]: Invoking remote custom action. DLL: C:\WINDOWS\Installer\MSI324.tmp, Entrypoint: _BeginUninstallImmediate@4
... 

MSI (s) (58:10) [15:31:46:590]: Executing op: SetTargetFolder(Folder=C:\WINDOWS\system32\)
MSI (s) (58:10) [15:31:46:590]: Executing op: FileRemove(,FileName=mfc71u.dll,,ComponentId={3AC4AA25-A28A-4F09-826A-30CA0A620F35})

So it looked like Symantec EPM client install had removed the file post installation. Surprisingly we did not notice this behaviour on any other PC/Server. 

Fair to say sometimes you find the cause of a problem in the least expected places.

Remember this Post , here is a quick recap:

Event ID: 2019
The server was unable to allocate from the system nonpaged pool because the pool was empty.

So what does it really mean ?

This error is from  Server Service reporting that when it was trying to satisfy a request, it was not able to find enough free memory of the respective type of pool. Error 2020 indicates Paged Pool and 2019, NonPaged Pool. This doesn’t mean that the Server Service (srv.sys) is broken or the root cause of the problem, more often rather it is the first component to see the resource problem and report it to the Event Log.

I installed Poolmon which told me that Thre  is the largest consumer of memory

Poolmon 

So what's "Thre" ?

Thre - nt!ps - Thread objects

Note, the nt before the ! means that this is NT or the kernel’s tag for Thread objects. So there has to be a process that is leaking memory.

I got WinDbg running on this machine asap and entered "!proccess 0 0" in the command. Here is the output with all Processes with more than 1000 handle count.

PROCESS 89b9ad88  SessionId: 0  Cid: 0afc    Peb: 7ffd7000  ParentCid: 01c0
    DirBase: dfff07e0  ObjectTable: e628b498  HandleCount: 95969.
    Image: oracle.exe
PROCESS 89b2a690  SessionId: 0  Cid: 0c4c    Peb: 7ffdf000  ParentCid: 01c0
    DirBase: dfff0860  ObjectTable: e63b2358  HandleCount: 2244.
    Image: pinetmgr.exe
PROCESS 8a386698  SessionId: 0  Cid: 0f04    Peb: 7ffd4000  ParentCid: 01c0
    DirBase: dfff0a60  ObjectTable: e17e7408  HandleCount: 2167.
    Image: pimsgss.exe

A handle count of more than 95000 definitely set off alarms. I dug a bit deeper into the Oracle process with

!PROCESS 89b9ad88  4 

The process brought a whole bunch of threads.

!process 89b9ad88 4

PROCESS 89b9ad88  SessionId: 0  Cid: 0afc    Peb: 7ffd7000  ParentCid: 01c0
    DirBase: dfff07e0  ObjectTable: e628b498  HandleCount: 114448.
    Image: oracle.exe

      THREAD 89b97998  Cid 0afc.0b00  Teb: 7ffdf000 Win32Thread: e6355328 WAIT
        THREAD 89b77b78  Cid 0afc.0b44  Teb: 7ffdd000 Win32Thread: 00000000 WAIT
        THREAD 89b64458  Cid 0afc.0b50  Teb: 7ffdc000 Win32Thread: 00000000 WAIT
        THREAD 89a01020  Cid 0afc.1204  Teb: 7ffdb000 Win32Thread: e660b768 WAIT
        THREAD 89a007d0  Cid 0afc.1208  Teb: 7ffd9000 Win32Thread: 00000000 WAIT
        THREAD 89a003b8  Cid 0afc.120c  Teb: 7ffd8000 Win32Thread: 00000000 WAIT
        THREAD 899e5db0  Cid 0afc.1214  Teb: 7ffd6000 Win32Thread: 00000000 WAIT
        THREAD 899fadb0  Cid 0afc.121c  Teb: 7ffd5000 Win32Thread: 00000000 WAIT
        THREAD 899e6db0  Cid 0afc.1220  Teb: 7ffd4000 Win32Thread: 00000000 WAIT

I opened two random threads with

!thread command and this is what it came up with:

THREAD 897ebaf0  Cid 0afc.1b58  Teb: 00000000 Win32Thread: 00000000 TERMINATED
Not impersonating
DeviceMap                 e1000908
Owning Process            89b9ad88       Image:         oracle.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      17893          Ticks: 16701178 (3:00:29:15.906)
Context Switch Count      24            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x0040162c
Start Address 0x77e617ec
Stack Init 0 Current b949fba0 Base b94a0000 Limit b949d000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0

THREAD 8969e020  Cid 0afc.08c0  Teb: 00000000 Win32Thread: 00000000 TERMINATED
Not impersonating
DeviceMap                 e1000908
Owning Process            89b9ad88       Image:         oracle.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      45267          Ticks: 16678772 (3:00:23:25.812)
Context Switch Count      27            
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x0040162c
Start Address 0x77e617ec
Stack Init 0 Current b9c3fba0 Base b9c40000 Limit b9c3d000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child

I could tell that the threads have been terminated and they belonged to Oracle.exe but somehow they have not been cleared from memory.

I opened Task Manager and from the View column option added the Handle count and saw this:

TaskManager-3

The Handle count was growing at a fair bit of speed.

TaskManager-4

I have contacted the Oracle boys to check out the issue but I am pretty sure one of the Oracle app on that box or Oracle itself is the cause of the memory leak.

I’ll post back when the Oracle team have come back with their investigation.

Error 0x80004002

If you seeing the following errors in "%Windir%\WindowsUpdate.log".

FATAL: Failed to get session from datastore, error = 0x80004002
FATAL: Failed to Unserialize from data store, error = 0x80004002
WARNING: Exit code = 0x80004002
WARNING: InitAUComponents Failed, will restart AU in 30 mins, error = 0x80004002
DnldMgr FATAL: DM:CAgentDownloadManager::CheckAllCallDownloadStates: GetSession failed with 0x80004002.

Try the following steps:

Rename/Delete "%Windir%\WindowsUpdate.log" before you start to see the latest detection.

1. If you have a proxy server -

  • Stop Automatic Update Service - "Net stop wuauserv".
  • Run the following command: "proxycfg.exe -u". 
  • Start Automatic Update Service - "Net start wuauserv".
  • Force update detection and see if that works.

2. Re-create the Datastore -

  • Stop Automatic Update Service - "Net stop wuauserv".
  • Rename/delete "%Windir%\SoftwareDistribution" folder. 
  • Start Automatic Update Service - "Net start wuauserv".
  • Force update detection and see if that works.

3. Try to force install WUA.

Finally got my copy of Windows 7 this morning.  Installed it asap and I have to say one thing, its ready. Microsoft can take it out of beta anytime. Its a lot more polished then Vista was in its beta days.


I have installed it on my main computer and rendered some home videos on it and it was quite fast and stable.

I have got some spare keys for both Windows 7 64 and 32 bit, so if u need one, drop me a line.

This morning Microsoft's web servers fell to their knees under the pressure of enthusiasts trying to download Windows 7, yes I was one of them as well, no I couldn't get the beta copy, the download page kept timing out. Even the page for getting the evaluation keys was hammered and I couldn't get keys.

For a company that's pushing cloud computing and web services and trying so hard to prove that they're ready to move off the desktop and onto the web, today's Windows 7 mess proves Microsoft is still needs to work out about how to do stuff online to be taken seriously.
They could have looked at Ubuntu who are always seeding a torrent of their main distro.

So even the biggest and the best do get it wrong sometimes.

Windows 7 goes to public beta today. I will be installing it on one of my laptops asap. Here is a link for those who are looking for it.
http://www.microsoft.com/windows/windows-7/default.aspx

Another important thing to keep in mind is that the Windows 7 Beta will expire on August 1st, 2009

We had an issue today with our Oracle server. 


First this came up:

Event Type: Error
Event Source: Srv
Event Category: None
Event ID: 2019
Date:  2/01/2009
Time:  11:33:46 AM
Description:
The server was unable to allocate from the system nonpaged pool because the pool was empty.

Then this:

Event Type: Error
Event Source: EventLog
Event Category: None
Event ID: 6008
Date:  2/01/2009
Time:  11:39:44 AM
Description:
The previous system shutdown at 11:39:39 AM on 2/01/2009 was unexpected.

Yep right in the middle of the day the server crashes and to top it all its a production Oracle server. Have got memtriage and Poolmon running on it to monitor nonpaged pool. Will post back with results.

Happy new year everyone !


Seen this one before ?

Operation terminated with error -1014 (JET_errOutOfBuffers, Out of database page buffers) after 6701.125 seconds.

Was trying to get an Exchange 2003 database to defrag and got the above error.  Took lot of work to find out the culprit. Running parallel defrag gets you the above error. Do the defrag one after another and all should be fine.