United States Argentina Australia Austria Belgium Canada Chile Colombia Costa Rica Dominican Republic France Germany Bangladesh/India Italy Kenya Mexico Netherlands Puerto Rico South Africa Sweden Switzerland Venezuela
BASIS International Ltd.
Home | Site Map | Contact Us | Partner Login  

 













BASIS Knowledge Base Article #01166

Title:

Excessive Garbage Collection pause times on large BBJ installations.

Description:

Large BBj installations experiencing long pauses due to long garbage collection (GC). The following article offers guidelines for tuning the JVM on installations of BBj that require approximately 1GB or more of RAM on systems with 2 or more processors using Sun Microsystems HotSpot JVM.

First step is to determine if long GC times are occurring. Add the following arguments to JVM Arguments in Enterprise Manager (EM ) or to the basis.java.args.BBjServices line in the BBj.properties file to create a log file for the GC times.

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<basis log dir>/gc.log

If editing the BBj.properties directly, add a '\' before every ':' and '=' in any of the properties. Here is a sample of the modified line in the BBj.properties file located in <bbj_isntall_dir>\cfg directory:

basis.java.args.BBjServices=-Xmx912m -Xms912m -XX\:NewRatio\=4 -XX\:CompileCommandFile\=C\:\\basis\\cfg\\.hotspot_compiler -Dnetworkaddress.cache.ttl\=10 -Dsun.net.inetaddr.ttl\=10 -Xloggc\:\\basis\\gclog -XX\:+PrintGCDetails -XX\:+PrintGCTimeStamps

The last "n.nnn secs" of each line in the file 'gc.log' shows how long that line took. Excessive times in this field is indication of long GC times. It can be helpful to import the gc log into Excel and sort by the last column to determine how long the full gc's are taking.

The default collector on "server-class" machines is the parallel collector (option -XX:+UseParallelGC). The definition of "server-class" may vary over time and difficult to define. This collector is intended to work very efficiently with many processors, but does not guarantee low pause times. With responsive applications, server pause times of more than 1 or 2 seconds are usually quite noticeable.

Another collector, the concurrent low pause collector, or CMS collector (option -XX:+UseConcMarkSweepGC) is intended for this usage pattern. If you decide to switch to this version, here are some basic guidelines:

1. Add about 40% larger -Xmx. This collector does not compact data in the old generation, and memory fragmentation can become an issue.

2. Ensure the following options are enabled:

-XX:+PrintHeapAtGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=60

This should keep heap fragmentation from becoming an issue, and keep the stop-the-world phases of the CMS collector from being too long.

3. Add the following options:

-XX:MaxNewSize=250m -XX:TargetSurvivorRatio=90 -XX:+UseParallelNewGC -XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=8

This will attempt to keep young generation pause times low, and will attempt to delay promoting objects that don't last very long to avoid fragmentation. For each core on the system, MaxNewSize may increase by 50m (so for 4 processors, this could be 350m).

Note: For 64-bit systems add 30% to all memory sizes since the overhead of OS structures increases. Ex:

-XX:MaxNewSize=250m

to:

-XX:MaxNewSize=325m



The logs generated by the diagnostic options will greatly help with tuning these parameters further if necessary.




Last Modified: 01/31/2008
Product: BBj
Operating System: All platforms