I’ve been doing some testing over the last few months on our new Data Center Edition (DCE) of SugarCRM. One of the key components that’s been giving me trouble was our NFS. I could successfully test with hundreds of users on local disk, but switching to NFS would cause a cascading failure where the system would become almost unresponsive.
The tests on NFS started out fine. I routinely had response times around 170ms for the login page. That’s the average response for the local disk tests on these servers. The problem started as soon as my "users" would start going down multiple paths of code. The CPU load would spike and response times would jump. Both would continue to get worse the longer the test ran.
Took some research and talking with one of the core engineers at Zend before I finally isolated the problem. PHP’s
realpath_cache_size setting was the culprit.
realpath_cache_size is used by PHP to keep from having to look up file names. Every time you perform any of the various file functions or include/require a file and use a relative path, PHP has to look up where that file really exists. PHP caches those values so it doesn’t have to search the current working directory and
include_path for the file you’re working on.
Since the default setting of
realpath_cache_size is only 16K, Sugar was filling that up pretty quickly with 50 instances of Sugar running at the same time on one server. I had never seen any ill-effects from this on the local disk runs, but NFS was the perfect breeding ground for this problem. Simple lstat calls take roughly 10x longer to respond than the standard local disk because of the network overhead. Those slight delays—ten-thousandths of a second—were just enough to cause response times to jump.
The fix was easy enough. Up the
realpath_cache_size setting. Finding the sweet spot required some trial and error though. My current setting is 128k. I ran tests with 32k and 64k. 32k is still too small and 64k was ok but I still saw random spikes which I think were caused by the cache filling up. Since there’s no way to expose the realpath cache data in PHP, there’s no way to know for sure what is happening inside it. If I have some downtime in the next week or two, I may throw together a quick extension for getting that data so I can make a quick monitor script.
In the meantime, setting
realpatch_cache_size to a value between 64k and 128k should work fine for you if you’re serving Sugar off of an NFS. You’ll notice some benefits on a local disk too, but a local disk can keep up with Sugar even without the the optimal realpath caching. Keep in mind that each PHP process allocates memory for the cache, regardless of how much of it is used. For example, if your web server can handle 100 requests to PHP files and all of them run at the same time, PHP will allocate 12.5 megs of memory if your
realpatch_cache_size is set to 128k.
I experimented with the
realpath_cache_ttl setting some too. My tests ran the same regardless of whether it was configured to have a TTL of a few hours or the default 120 seconds.