Lately I've been involved in a couple of troubleshooting sessions where JRUN on a CFMX server was causing 95% to 100% processor utilization. Unfortunately I have not yet stumbled onto a magic bullet for this. Tweaking memory settings, changing garbage collection routines, modifying the threads for the scheduler and the simultaneous requests all seem to help, and in some cases solve the problem. I have never found one single solution that solves this problem. It usually comes down to either JVM arguments or an external process (a database, queue, COM, FTP etc) that is causing a hanging request.
Today, however, I stumbled upon a solution that seemed to solve the problem immediately. If your processor spike is due to this specific issue then this seems to fix it. Keep in mind, that I'm basing that opinion on the fact that taking the following steps seems to have fixed a production server in my care - so take it for what it's worth.
I got a call today from an important customer. This customer was running pretty well on a complicated system for over a year when suddenly, JRUN started running at 90 to 100 percent. After poking around I noticed that the default "out" log (/runtime/logs/*instance*-out.log) was quite large - over 500 megs. I surmised that Coldfusion might struggle a little bit trying to append to such a large file. We stopped the server, cleared the directory and restarted the server. Sure enough, Jrun went back to behaving. I'm quite sure this is not the fix for every situation, but it's definitely something to check - especially if the spike happens on a well-behaved server with no precipitous changes.
For more on log files and where to find them seem my post on Why is my Jrun Err log so big and my other post on Finding all those sneaky log files.
NOTE: There is a "day after" follow up to this post that is titled Fixing Locked Jrun Log files Without Restarting Coldfusion.
Should you have the leisure to "experiment" again, you could copy and then overwrite the whole log file with a CFFILE statement block (assuming the request would run, perhaps using the jws thread pool if the jrpp pool is hung), which would effectively reinitialize the log without restarting. (Delete is not possible for Windows users since CF maintains a lock on it, but perhaps CF can CFFILE/copy/delete it).
Conversely, if you find the problem resolved by stopping and deleting the log, perhaps you could then reintroduce the log to see if the high CPU behavior comes back.
Someone else blogged today about large (8GB) server-out.log. Note that there is a JRun bug for only rotating the server-event.log but not the out or err logs. That's targeted for an upcoming release.
Normally I would agree with you because agreeing with you has always made me look good (ha). But in this case we had already restarted CF several times with no change to processor utilization. The procs would spike as soon as CF reached a full head of steam. Only after removing the log file did the procs come down.
As for an overwrite with CFFILE - I suspect it wouldn't work... but I will experiment and see. It has the disadvantage of kill the log file - which sometimes contains useful information. But it would be a runtime solution and keep us from a server restart.
As far as zero'ing out the log with a cffile write, what I means was to do a copy of it first, i.e. default-out.log.1, then zero out default-out.log. On Unix/Linux this test would be even easier root could do this without dealing with file locks.
I'll see if I can generate a multi-GB out.log for testing. Of course, there may be other uknowns involved, but I could at least see if a large log alone is sufficient.
ColdFusion was stopped and started, and links were clicked again, just to repeat the "normal" condition.
Then I created a coldfusion-out.log by cat'ing other log info into it until it was 2GB. Starting ColdFusion took 117 seconds, so not much difference there. Loading the CFAdmin was equally slow at about 5 seconds for the login to complete. Then clicking the links again took about 1 second each. In sum, it appeared to be no difference. I even ran the code that writes JDBC driver versions to the out.log just to make sure ColdFusion had an open file handle on it. The version info showed up in the tail, but still no unusual behavior.
In the normal and test cases I monitored CPU and Mem with top, but there were no important differences. In the test case, when I stopped clicking around the CPU settled down to 0-5%.
Like many tests, problems are sometimes platform dependent, environment factor dependent, or particular to some combination of local factors for that server. So on your "test" server nulling the out log may have had an impact, but there may have been some other unique aspect to the server that exacerbated the problem. My simple test shows that at least as a baseline, a large out log is not sufficient by itself to cause high resource utilization.