<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>

<!--
     ASCII to XML transformation by Invisible Worlds, Inc.
     http://invisible.net/
     Last transformation: 03-Feb-1999, 02:04:05

     Cannonical version of this document is at:
     http://info.internet.isi.edu/in-notes/rfc/files/rfc2140.txt

     Implementors should verify all content with
     cannonical version.  Failure to do so may result in
     protocol failures.
-->

<rfc number="2140"
     category="info">
<front>
<title abbrev="TCP Control Block">TCP Control Block Interdependence</title>
<author initials="J." surname="Touch" fullname="Joe Touch">
<organization>University of Southern California/Information Sciences Institute</organization>
<address>
<postal>
<street>4676 Admiralty Way</street>
<street>Marina del Rey</street>
<street>CA 90292-6695</street>
<country>USA</country>
</postal>
<phone>+1 310-822-1511 x151</phone>
<facsimile>+1 310-823-6714</facsimile>
<email>touch@isi.edu</email>
<uri>http://www.isi.edu/~touch</uri>
</address>
</author>
<date month="April" year="1997"/>
<area>Transport</area>
<keyword>TCP</keyword>
<keyword>congestion</keyword>
<keyword>transmission control protocol</keyword>
<abstract>
<t>
   This memo makes the case for interdependent TCP control blocks, where
   part of the TCP state is shared among similar concurrent connections,
   or across similar connection instances. TCP state includes a
   combination of parameters, such as connection state, current round-
   trip time estimates, congestion control information, and process
   information.  This state is currently maintained on a per-connection
   basis in the TCP control block, but should be shared across
   connections to the same host. The goal is to improve transient
   transport performance, while maintaining backward-compatibility with
   existing implementations.
</t>
<t>
   This document is a product of the LSAM project at ISI.
</t>
</abstract>
</front>
<middle>
<section title="An Example of Temporal Sharing">
<t>
   Temporal sharing of cached TCB data has been implemented in the SunOS
   4.1.3 T/TCP extensions <xref target="_XREF_4"/> and the FreeBSD port of same <xref target="_XREF_7"/>. As
   mentioned before, only the MSS and RTT parameters are cached, as
   originally specified in <xref target="_XREF_2"/>. Later discussion of T/TCP suggested
   including congestion control parameters in this cache <xref target="_XREF_3"/>.
</t>
<t>
   The cache is accessed in two ways: it is read to initialize new TCBs,
   and written when more current per-host state is available. New TCBs
   are initialized as follows; snd_cwnd reuse is not yet implemented,
   although discussed in the T/TCP concepts <xref target="_XREF_2"/>:
</t>
<figure><artwork>
               TEMPORAL SHARING - TCB Initialization

             Cached TCB           New TCB
             ----------------------------------------
             old-MSS              old-MSS

             old-RTT              old-RTT

             old-RTTvar           old-RTTvar

             old-snd_cwnd         old-snd_cwnd    (not yet impl.)
</artwork></figure>
<t>
   Most cached TCB values are updated when a connection closes.  An
   exception is MSS, which is updated whenever the MSS option is
   received in a TCP header.
</t>
<figure><artwork>
                 TEMPORAL SHARING - Cache Updates

    Cached TCB   Current TCB     when?   New Cached TCB
    ---------------------------------------------------------------
    old-MSS      curr-MSS        MSSopt  curr-MSS

    old-RTT      curr-RTT        CLOSE   old += (curr - old) &gt;&gt; 2

    old-RTTvar   curr-RTTvar     CLOSE   old += (curr - old) &gt;&gt; 2

    old-snd_cwnd curr-snd_cwnd   CLOSE   curr-snd_cwnd   (not yet impl.)
</artwork></figure>
<t>
   MSS caching is trivial; reported values are cached, and the most
   recent value is used. The cache is updated when the MSS option is
   received, so the cache always has the most recent MSS value from any
   connection. The cache is consulted only at connection establishment,
   and not otherwise updated, which means that MSS options do not affect
   current connections. The default MSS is never saved; only reported
   MSS values update the cache, so an explicit override is required to
   reduce the MSS.
</t>
<t>
   RTT values are updated by a more complicated mechanism <xref target="_XREF_3"/>, <xref target="_XREF_8"/>.
   Dynamic RTT estimation requires a sequence of RTT measurements, even
   though a single T/TCP transaction may not accumulate enough samples.
   As a result, the cached RTT (and its variance) is an average of its
   previous value with the contents of the currently active TCB for that
   host, when a TCB is closed. RTT values are updated only when a
   connection is closed. Further, the method for averaging the RTT
   values is not the same as the method for computing the RTT values
   within a connection, so that the cached value may not be appropriate.
   For temporal sharing, the cache requires updating only when a
   connection closes, because the cached values will not yet be used to
   initialize a new TCB. For the ensemble sharing, this is not the case,
   as discussed below.
</t>
<t>
   Other TCB variables may also be cached between sequential instances,
   such as the congestion control window information. Old cache values
   can be overwritten with the current TCB estimates, or a MAX or MIN
   function can be used to merge the results, depending on the optimism
   or pessimism of the reused values. For example, the congestion window
   can be reused if there are no concurrent connections.
</t>
</section>
<section title="An Example of Ensemble Sharing">
<t>
   Sharing cached TCB data across concurrent connections requires
   attention to the aggregate nature of some of the shared state.
   Although MSS and RTT values can be shared by copying, it may not be
   appropriate to copy congestion window information. At this point, we
   present only the MSS and RTT rules:
</t>
<figure><artwork>
               ENSEMBLE SHARING - TCB Initialization

               Cached TCB           New TCB
               ----------------------------------
               old-MSS              old-MSS

               old-RTT              old-RTT

               old-RTTvar           old-RTTvar

                    ENSEMBLE SHARING - Cache Updates

      Cached TCB   Current TCB     when?   New Cached TCB
      -----------------------------------------------------------
      old-MSS      curr-MSS        MSSopt  curr-MSS

      old-RTT      curr-RTT        update  rtt_update(old,curr)

      old-RTTvar   curr-RTTvar     update  rtt_update(old,curr)
</artwork></figure>
<t>
   For ensemble sharing, TCB information should be cached as early as
   possible, sometimes before a connection is closed. Otherwise, opening
   multiple concurrent connections may not result in TCB data sharing if
   no connection closes before others open. An optimistic solution would
   be to update cached data as early as possible, rather than only when
   a connection is closing. Some T/TCP implementations do this for MSS
   when the TCP MSS header option is received <xref target="_XREF_4"/>, although it is not
   addressed specifically in the concepts or functional specification
   <xref target="_XREF_2"/><xref target="_XREF_3"/>.
</t>
<t>
   In current T/TCP, RTT values are updated only after a CLOSE, which
   does not benefit concurrent sessions. As mentioned in the temporal
   case, averaging values between concurrent connections requires
   incorporating new RTT measurements. The amount of work involved in
   updating the aggregate average should be minimized, but the resulting
   value should be equivalent to having all values measured within a
   single connection. The function &quot;rtt_update&quot; in the ensemble sharing
   table indicates this operation, which occurs whenever the RTT would
   have been updated in the individual TCP connection. As a result, the
   cache contains the shared RTT variables, which no longer need to
   reside in the TCB <xref target="_XREF_8"/>.
</t>
<t>
   Congestion window size aggregation is more complicated in the
   concurrent case.  When there is an ensemble of connections, we need
   to decide how that ensemble would have shared the congestion window,
   in order to derive initial values for new TCBs. Because concurrent
   connections between two hosts share network paths (usually), they
   also share whatever capacity exists along that path.  With regard to
   congestion, the set of connections might behave as if it were
   multiplexed prior to TCP, as if all data were part of a single
   connection. As a result, the current window sizes would maintain a
   constant sum, presuming sufficient offered load. This would go beyond
   caching to truly sharing state, as in the RTT case.
</t>
<t>
   We pause to note that any assumption of this sharing can be
   incorrect, including this one. In current implementations, new
   congestion windows are set at an initial value of one segment, so
   that the sum of the current windows is increased for any new
   connection. This can have detrimental consequences where several
   connections share a highly congested link, such as in trans-Atlantic
   Web access.
</t>
<t>
   There are several ways to initialize the congestion window in a new
   TCB among an ensemble of current connections to a host, as shown
   below. Current TCP implementations initialize it to one segment <xref target="_XREF_9"/>,
   and T/TCP hinted that it should be initialized to the old window size
   <xref target="_XREF_3"/>. In the former, the assumption is that new connections should
   behave as conservatively as possible. In the latter, no accommodation
   is made to concurrent aggregate behavior.
</t>
<t>
   In either case, the sum of window sizes can increase, rather than
   remain constant. Another solution is to give each pending connection
   its &quot;fair share&quot; of the available congestion window, and let the
   connections balance from there. The assumption we make here is that
   new connections are implicit requests for an equal share of available
   link bandwidth which should be granted at the expense of current
   connections. This may or may not be the appropriate function; we
   propose that it be examined further.
</t>
<figure><artwork>
                ENSEMBLE SHARING - TCB Initialization
                Some Options for Sharing Window-size

    Cached TCB                           New TCB
    -----------------------------------------------------------------
    old-snd_cwnd         (current)       one segment

                         (T/TCP hint)    old-snd_cwnd

                         (proposed)      old-snd_cwnd/(N+1)
                                         subtract old-snd_cwnd/(N+1)/N
                                         from each concurrent

                 ENSEMBLE SHARING - Cache Updates

    Cached TCB   Current TCB     when?   New Cached TCB
    ----------------------------------------------------------------
    old-snd_cwnd curr-snd_cwnd   update  (adjust sum as appropriate)
</artwork></figure>
</section>
<section title="Compatibility Issues">
<t>
   Current TCP implementations do not use TCB caching, with the
   exception of T/TCP variants <xref target="_XREF_4"/><xref target="_XREF_7"/>. New connections use the default
   initial values of all non-instantiated TCB variables. As a result,
   each connection calculates its own RTT measurements, MSS value, and
   congestion information. Eventually these values are updated for each
   connection.
</t>
<t>
   For the congestion and current window information, the initial values
   may not be consistent with the long-term aggregate behavior of a set
   of concurrent connections. If a single connection has a window of 4
   segments, new connections assume initial windows of 1 segment (the
   minimum), although the current connection&apos;s window doesn&apos;t decrease
   to accommodate this additional load. As a result, connections can
   mutually interfere. One example of this has been seen on trans-
   Atlantic links, where concurrent connections supporting Web traffic
   can collide because their initial windows are too large, even when
   set at one segment.
</t>
<t>
   Because this proposal attempts to anticipate the aggregate steady-
   state values of TCB state among a group or over time, it should avoid
   the transient effects of new connections. In addition, because it
   considers the ensemble and temporal properties of those aggregates,
   it should also prevent the transients of short-lived or multiple
   concurrent connections from adversely affecting the overall network
   performance. We are performing analysis and experiments to validate
   these assumptions.
</t>
</section>
<section title="Performance Considerations">
<t>
   Here we attempt to optimize transient behavior of TCP without
   modifying its long-term properties. The predominant expense is in
   maintaining the cached values, or in using per-host state rather than
   per-connection state. In cases where performance is affected,
   however, we note that the per-host information can be kept in per-
   connection copies (as done now), because with higher performance
   should come less interference between concurrent connections.
</t>
<t>
   Sharing TCB state can occur only at connection establishment and
   close (to update the cache), to minimize overhead, optimize transient
   behavior, and minimize the effect on the steady-state. It is possible
   that sharing state during a connection, as in the RTT or window-size
   variables, may be of benefit, provided its implementation cost is not
   high.
</t>
</section>
<section title="Implications">
<t>
   There are several implications to incorporating TCB interdependence
   in TCP implementations. First, it may prevent the need for
   application-layer multiplexing for performance enhancement <xref target="_XREF_6"/>.
   Protocols like persistent-HTTP avoid connection reestablishment costs
   by serializing or multiplexing a set of per-host connections across a
   single TCP connection. This avoids TCP&apos;s per-connection OPEN
   handshake, and also avoids recomputing MSS, RTT, and congestion
   windows. By avoiding the so-called, &quot;slow-start restart,&quot; performance
   can be optimized. Our proposal provides the MSS, RTT, and OPEN
   handshake avoidance of T/TCP, and the &quot;slow-start restart avoidance&quot;
   of multiplexing, without requiring a multiplexing mechanism at the
   application layer. This multiplexing will be complicated when
   quality-of-service mechanisms (e.g., &quot;integrated services
   scheduling&quot;) are provided later.
</t>
<t>
   Second, we are attempting to push some of the TCP implementation from
   the traditional transport layer (in the ISO model [10]), to the
   network layer. This acknowledges that some state currently maintained
   as per-connection is in fact per-path, which we simplify as per-
   host-pair. Transport protocols typically manage per-application-pair
   associations (per stream), and network protocols manage per-path
   associations (routing). Round-trip time, MSS, and congestion
   information is more appropriately handled in a network-layer fashion,
   aggregated among concurrent connections, and shared across connection
   instances.
</t>
<t>
   An earlier version of RTT sharing suggested implementing RTT state at
   the IP layer, rather than at the TCP layer <xref target="_XREF_8"/>. Our observations are
   for sharing state among TCP connections, which avoids some of the
   difficulties in an IP-layer solution. One such problem is determining
   the associated prior outgoing packet for an incoming packet, to infer
   RTT from the exchange. Because RTTs are still determined inside the
   TCP layer, this is simpler than at the IP layer. This is a case where
   information should be computed at the transport layer, but shared at
   the network layer.
</t>
<t>
   We also note that per-host-pair associations are not the limit of
   these techniques. It is possible that TCBs could be similarly shared
   between hosts on a LAN, because the predominant path can be LAN-LAN,
   rather than host-host.
</t>
<t>
   There may be other information that can be shared between concurrent
   connections. For example, knowing that another connection has just
   tried to expand its window size and failed, a connection may not
   attempt to do the same for some period. The idea is that existing TCP
   implementations infer the behavior of all competing connections,
   including those within the same host or LAN. One possible
   optimization is to make that implicit feedback explicit, via extended
   information in the per-host TCP area.
</t>
</section>
<section title="Security Considerations">
<t>
   These suggested implementation enhancements do not have additional
   ramifications for direct attacks. These enhancements may be
   susceptible to denial-of-service attacks if not otherwise secured.
   For example, an application can open a connection and set its window
   size to 0, denying service to any other subsequent connection between
   those hosts.
</t>
<t>
   TCB sharing may be susceptible to denial-of-service attacks, wherever
   the TCB is shared, between connections in a single host, or between
   hosts if TCB sharing is implemented on the LAN (see Implications
   section).  Some shared TCB parameters are used only to create new
   TCBs, others are shared among the TCBs of ongoing connections. New
   connections can join the ongoing set, e.g., to optimize send window
   size among a set of connections to the same host.
</t>
<t>
   Attacks on parameters used only for initialization affect only the
   transient performance of a TCP connection.  For short connections,
   the performance ramification can approach that of a denial-of-service
   attack.  E.g., if an application changes its TCB to have a false and
   small window size, subsequent connections would experience
   performance degradation until their window grew appropriately.
</t>
<t>
   The solution is to limit the effect of compromised TCB values.  TCBs
   are compromised when they are modified directly by an application or
   transmitted between hosts via unauthenticated means (e.g., by using a
   dirty flag). TCBs that are not compromised by application
   modification do not have any unique security ramifications. Note that
   the proposed parameters for TCB sharing are not currently modifiable
   by an application.
</t>
<t>
   All shared TCBs MUST be validated against default minimum parameters
   before used for new connections. This validation would not impact
   performance, because it occurs only at TCB initialization.  This
   limits the effect of attacks on new connections, to reducing the
   benefit of TCB sharing, resulting in the current default TCP
   performance. For ongoing connections, the effect of incoming packets
   on shared information should be both limited and validated against
   constraints before use. This is a beneficial precaution for existing
   TCP implementations as well.
</t>
<t>
   TCBs modified by an application SHOULD not be shared, unless the new
   connection sharing the compromised information has been given
   explicit permission to use such information by the connection API. No
   mechanism for that indication currently exists, but it could be
   supported by an augmented API. This sharing restriction SHOULD be
   implemented in both the host and the LAN. Sharing on a LAN SHOULD
   utilize authentication to prevent undetected tampering of shared TCB
   parameters. These restrictions limit the security impact of modified
   TCBs both for connection initialization and for ongoing connections.
</t>
<t>
   Finally, shared values MUST be limited to performance factors only.
   Other information, such as TCP sequence numbers, when shared, are
   already known to compromise security.
</t>
</section>
<section title="Acknowledgements">
<t>
   The author would like to thank the members of the High-Performance
   Computing and Communications Division at ISI, notably Bill Manning,
   Bob Braden, Jon Postel, Ted Faber, and Cliff Neuman for their
   assistance in the development of this memo.
</t>
</section>
</middle>
<back>
<!-- BEGIN INCLUDE REFERENCES ** DO NOT REMOVE -->
<references>
<reference anchor="_XREF_1">
<front>
<title abbrev="Berners-Lee">Berners-Lee, T., et al., &quot;The World-Wide Web,&quot; Communications of the ACM, V37, pp. 76-82</title>
<author>
<organization/>
</author>
<date month="August" year="1994"/>
</front>
</reference>
<reference anchor="_XREF_2">
<front>
<title abbrev="Transaction TCP -- Concepts">Transaction TCP -- Concepts,&quot; RFC-1379, USC/Information Sciences Institute</title>
<author initials="R." surname="Braden" fullname="R. Braden">
<organization/>
</author>
<date month="September" year="1992"/>
</front>
</reference>
<reference anchor="_XREF_3">
<front>
<title abbrev="T/TCP -- TCP Extensions for Transactions">T/TCP -- TCP Extensions for Transactions Functional Specification,&quot; RFC-1644, USC/Information Sciences Institute</title>
<author initials="R." surname="Braden" fullname="R. Braden">
<organization/>
</author>
<date month="July" year="1994"/>
</front>
</reference>
<reference anchor="_XREF_4">
<front>
<title abbrev="T/TCP -- Transaction TCP: Source Changes">T/TCP -- Transaction TCP: Source Changes for Sun OS 4.1.3, &quot;, Release 1.0, USC/ISI</title>
<author initials="B." surname="Braden" fullname="B. Braden">
<organization/>
</author>
<date month="September" year="1994"/>
</front>
</reference>
<reference anchor="_XREF_5">
<front>
<title abbrev="and Stevens">and Stevens, D., Internetworking with TCP/IP, V2, Prentice-Hall, NJ</title>
<author initials="D." surname="Comer" fullname="D. Comer">
<organization/>
</author>
<date month="" year="1991"/>
</front>
</reference>
<reference anchor="_XREF_6">
<front>
<title abbrev="Hypertext Transfer Protocol -- HTTP/1.1">Hypertext Transfer Protocol -- HTTP/1.1,&quot; Work in Progress</title>
<author initials="R." surname="Fielding" fullname="R. Fielding">
<organization/>
</author>
<date month="" year=""/>
</front>
</reference>
<reference anchor="_XREF_7" target="http://www.freebsd.org">
<front>
<title abbrev="FreeBSD source code">FreeBSD source code, Release 2.10, &lt;http://www.freebsd.org</title>
<author>
<organization/>
</author>
<date month="" year=""/>
</front>
</reference>
<reference anchor="_XREF_8">
<front>
<title abbrev="mail to public list tcp-ip">mail to public list &quot;tcp-ip&quot;, no archive found</title>
<author initials="V." surname="Jacobson" fullname="V. Jacobson">
<organization/>
</author>
<date month="" year="1986"/>
</front>
</reference>
<reference anchor="_XREF_9">
<front>
<title abbrev="Transmission Control Protocol">Transmission Control Protocol,&quot; Network Working Group RFC-793/STD-7, ISI</title>
<author>
<organization/>
</author>
<date month="September" year="1981"/>
</front>
</reference>
</references>
<!-- END INCLUDE REFERENCES ** DO NOT REMOVE -->
</back>
</rfc>
