Saturday, November 16, 2013

Ehcache RMI Replicated Caching in cloud environments

The RMI Ehcache provides 2 ways for peer discovery: Automatic Peer Discovery and Manual Peer Discovery.

If the application is deployed in an environment where multicast is not allowed or not available like Amazon AWS or FireHost clouds, then we must use Manual Peer Discovery.

Follow the guide, configure Manual Peer Discovery as:

Configuration for server1
<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
properties="peerDiscovery=manual,
rmiUrls=//server2:40001/sampleCache11|//server2:40001/sampleCache12"/>

Configuration for server2

<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory"
properties="peerDiscovery=manual,
rmiUrls=//server1:40001/sampleCache11|//server1:40001/sampleCache12"/>

Configuring the CacheManagerPeerListener as:
<cacheManagerPeerListenerFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
properties="hostName=server1, port=40001,
socketTimeoutMillis=2000"/>

<cacheManagerPeerListenerFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
properties="hostName=server2, port=40001,
socketTimeoutMillis=2000"/>

So we need to open the firewall for incoming TCP port 40001 on both peers.

But then each peer server can't send message to other with the connection timed out error:

RMIAsynchronousCacheReplicator Unable to send message to remote peer.  Message was: Connection refused to host: server2; nested exception is:
    java.net.ConnectException: Connection timed out
java.rmi.ConnectException: Connection refused to host: server2; nested exception is:
    java.net.ConnectException: Connection timed out
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
    at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
    at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
    at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128)
    at net.sf.ehcache.distribution.RMICachePeer_Stub.send(Unknown Source)
    at net.sf.ehcache.distribution.RMIAsynchronousCacheReplicator.writeReplicationQueue(RMIAsynchronousCacheReplicator.java:314)
    at net.sf.ehcache.distribution.RMIAsynchronousCacheReplicator.replicationThreadMain(RMIAsynchronousCacheReplicator.java:127)
    at net.sf.ehcache.distribution.RMIAsynchronousCacheReplicator.access$000(RMIAsynchronousCacheReplicator.java:58)
    at net.sf.ehcache.distribution.RMIAsynchronousCacheReplicator$ReplicationThread.run(RMIAsynchronousCacheReplicator.java:389)
Caused by: java.net.ConnectException: Connection timed out

Review the guide/google around... to find out the problem, but couldn't know why? Then tried to change the firewall to allow all ports and it worked!

So absolutely there was another port that was used by the Ehcache but blocked by the firewall.

Google around and found the answer here
"by default does use portmap and therefore random, non-root ports."

So we need to add remoteObjectPort to the cacheManagerPeerListenerFactory as:
<cacheManagerPeerListenerFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
properties="hostName=server1, port=40001, remoteObjectPort=40002,
socketTimeoutMillis=2000"/>

<cacheManagerPeerListenerFactory
class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory"
properties="hostName=server2, port=40001, remoteObjectPort=40002,
socketTimeoutMillis=2000"/>

And close all ports on the firewall except TCP 40001, 40002 between those peers.

Problem is resolved.


8 comments:

  1. Thanks for your post. Helped me a lot.
    I need to clarify two things.
    1. Can we configure different port number for peerprovider and peerlistener factories.
    2. I get exception on peer listener when configured with non local host address in host name. In such case I able to listen to local host cache.
    Please help me out :)

    ReplyDelete
    Replies
    1. 1. yes, we can use different available ports as long as we open access for them in the firewall rules.
      2. listener host name is the only localhost address or the server IP where it's running on

      Delete
  2. Hi
    Firewall open between server1 and server2, but still the replication is only one way between server1 to server2,I do not see any error in the weblogic server. Could you please help me?

    ReplyDelete
    Replies
    1. Hi,

      Could you check if both servers are opening the same ports? If you are running them on Linux then can use this command:
      netstat -tulpn

      Then verify if those ports are not blocked by the firewall:

      From server1:
      telnet server2 port1
      telnet server2 port2

      From server2:
      telnet server1 port1
      telnet server2 port2

      You might enable log for Ehcache in log4j (I can't recall in detail sorry)

      Hope to help!

      Delete
    2. oh, From server2:
      telnet server1 port1
      telnet server1 port2

      Delete
  3. I was facing the same issue in Azure... your blog helped a lot

    ReplyDelete