In one of our BizTalk implementations we are using SFTP to transfer files. We are using the SFTP adapter from /n software. Recently we had a situation where we had too many timeouts written in the event log, and too many messages were retried using the out-of-box retry mechanism of BizTalk. We started to research where the bottleneck was. Our first bet was to think that the limitation was in the destination server. We talked to the system administrator and he told us that there is no limitation in the simultaneous SSH connections. Then we started to watch to BizTalk and we increased the maxConnection in the config file, however nothing changed in terms of the number of timeouts.
We continued to research and we began to watch to unix man pages to see if there is some parameter you could change in the SSH daemon. We took special attention to one parameter named MaxStartups. In the documentation it is said
Specifies the maximum number of concurrent unauthenticated con-
nections to the sshd daemon. Additional connections will be
dropped until authentication succeeds or the LoginGraceTime
expires for a connection. The default is 10.
Alternatively, random early drop can be enabled by specifying the
three colon separated values ``start:rate:full'' (e.g.,
"10:30:60"). sshd will refuse connection attempts with a proba-
bility of ``rate/100'' (30%) if there are currently ``start''
(10) unauthenticated connections. The probability increases lin-
early and all connection attempts are refused if the number of
unauthenticated connections reaches ``full'' (60).
It seems there is a limitation in the number of simultaneous unauthenticated connections. Of course we are using authentication, but there is a delay between the connection is opened and the authentication process. To confirm our feelings we made a simple test: we opened 10 connections without authenticate. When opening the 11th connection we got "Connection Refused". So the problem is, under peak load, BizTalk starts to open SSH connections and begins the authentication process. It it not hard to imagine that there is a probability to have connections refused, while others are in the authentication process. We expalined the situation to our system administrator and asked him to increase the parameters MaxStartups. He increased to 30 and guess what: the timeouts disappeared (well, in fact we have one or another occasionally). Always learning.