top

Connection Tracking

This document is based on linux 2.4.
Net filtering on linux 2.6 is written more efficiently,
but consept of filtering is the same.

At first, we see the way the each packet go through the local machine.
The following image is the depict of the routing.
Packet is coming from left.

-> [pre_routing] -> [routing] -> [forward] -> [post_routing] ->
                        |                  ^
                        |                  |
                        |               [routing]
			v                  |
                    [local_in]          [local_out]

(This side is local machine)

In linux 2.4 kernel, connection track filter is registered in each point
with nf_register_hook() fruntion.

Arguments which is passed to register function is defined in
ip_conntrack_standalone.c

as follwoing:

static struct nf_hook_ops ip_conntrack_in_ops
= { { NULL, NULL }, ip_conntrack_in, PF_INET, NF_IP_PRE_ROUTING,
        NF_IP_PRI_CONNTRACK };
static struct nf_hook_ops ip_conntrack_local_out_ops
= { { NULL, NULL }, ip_conntrack_local, PF_INET, NF_IP_LOCAL_OUT,
        NF_IP_PRI_CONNTRACK };
static struct nf_hook_ops ip_conntrack_out_ops
= { { NULL, NULL }, ip_refrag, PF_INET, NF_IP_POST_ROUTING, NF_IP_PRI_LAST };
static struct nf_hook_ops ip_conntrack_local_in_ops
= { { NULL, NULL }, ip_confirm, PF_INET, NF_IP_LOCAL_IN, NF_IP_PRI_LAST-1 };

In this document, C source file is in

/usr/src/linux-2.4/net/ipv4/netfilter_ipv4/

Functions which will be invoked at each point are

ip_conntrack_in() for NF_IP_PRE_ROUTING
ip_conntrack_local() for NF_IP_LOCAL_OUT
ip_confirm() for NF_IP_LOCAL_IN
ip_refrag() for NF_IP_POST_ROUTING

In those functions, ip_conntrack_local(), which is defined in
ip_conntrack_standalone.c, checks the packet length and calls
ip_conntrack_in() function.

This means that filtering at NF_IP_PRE_ROUTING and NF_IP_LOCAL_OUT are
done by ip_conntrack_in(), which is defined in ip_conntrack_core.c.

At NF_IP_POST_ROUTING point, ip_refrag() is invoked and this function
calls ip_confirm() and after that deals with fragmentation.
ip_confirm() is also defined ip_conntrack_core.c.

So, core functions are ip_conntrack_in() and ip_confirm().

Connection is tracked with struct ip_conntrack_tuple_hash (x2 named tuplehash)
and struct nf_ct_info (x6 named info) in the struct ip_conntrack,
which is defined in linux/netfilter_ipv4/ip_conntrack.h.

Here is the declaration of ip_conntrack_tuple_hash.

struct ip_conntrack_tuple_hash
{
        struct list_head list;

struct ip_conntrack_tuple tuple;

/* this == &ctrack->tuplehash[DIRECTION(this)]. */ struct ip_conntrack *ctrack; };

The index for ip_conntrack_tuple_hash[] in ip_conntrack (named trackhash)
is defined in ip_conntrack_tuple.h according to the direction of a packet
as following:

IP_CT_DIR_ORIGINAL = 0
IP_CT_DIR_REPLY = 1

And pointer named ctrack in ip_conntrack_tuple_hash is point one of these
two arrays.

When ip_conntrack structure is created (in init_conntrack()), ctrack points to
tuplehash[IP_CT_DIR_ORIGINAL]. (tuplehash is the name for this structure
in the struct ip_conntrack.)

And for tracking a connection uniquely, tuple is used.
The tuple seems to be complex, but in short, it represents an piar of
source ip and source port/destination ip and destination port.

ip_conntrack_in()

ip_conntrack_in() is called at NF_IP_PRE_ROUTING or NF_IP_LOCAL_OUT.

When ip_conntrack_in() is called at NF_IP_PRE_ROUTING, the packet is
"out going" one from sender point of view.
It is no matter where the packet is going to either the local machine
(local in)or another machine (forward).

When it is called at NF_IP_LOCAL_OUT, the packet is obviously "out going".

So, tuple stored in tuplehash[IP_CT_DIR_ORIGINAL] is for "out going" packet.
At the same time when tuple is made for IP_CT_DIR_ORIGINAL
inversed tuple is made for IP_CT_DIR_REPLY.
These are stored in each tuplehash[0|1].tuple.

(The direction for a packet and direction for a tuple is different.
tuple is made from "out going" packet address pair for NF_CT_DIR_ORIGINAL
and inversed address pair for NF_CT_DIR_REPLY.
But, once tuple is made, the system does see reply direction only for tuple
forNAT filtering. This is aNAT filtering issue.)

Each time ip_conntrack_in() is invoked, resolve_normal_ct() is called.
And it makes a tuple from given packet (struct sk_buff).

Using this tuple it tries to get pointer to struct ip_conntrack_tuple_hash
with ip_conntrack_find_get().

If the packet is "out going" one, a hash value is culculated
using "out going" address.
So, pointer to tuplehash[IP_CT_DIR_ORIGINAL] is returned.

On the other hand,if the packet is "in coming" one,
the hash value is culculated using "in coming" address and
pointer to tuplehash[IP_CT_DIR_REPLY] is returned.

If ip_conntrack_find_get() returns NULL, there is not ip_conntrack struct
related with this connection.

Then, resolve_normal_ct() calls init_conntrack() to make and initialize
a new ip_conntrack structure related with this connection.

static struct ip_conntrack_tuple_hash *
__ip_conntrack_find(const struct ip_conntrack_tuple *tuple,
                    const struct ip_conntrack *ignored_conntrack)
{
        struct ip_conntrack_tuple_hash *h;

MUST_BE_READ_LOCKED(&ip_conntrack_lock); h = LIST_FIND(&ip_conntrack_hash[hash_conntrack(tuple)], conntrack_tuple_cmp, struct ip_conntrack_tuple_hash *, tuple, ignored_conntrack); return h; }

struct ip_conntrack_tuple_hash * ip_conntrack_find_get(const struct ip_conntrack_tuple *tuple, const struct ip_conntrack *ignored_conntrack) { struct ip_conntrack_tuple_hash *h;

READ_LOCK(&ip_conntrack_lock); h = __ip_conntrack_find(tuple, ignored_conntrack); if (h) atomic_inc(&h->ctrack->ct_general.use); READ_UNLOCK(&ip_conntrack_lock);

return h; }

resolve_normal_ct() function handles connection tracking depending its state.
If there is not tracking entry in tracking list, create a new entry
for this connection using a packet data and register it into a tracking list.
It does do this by calling init_conntrack().

When there is the connection track for this connection, extract it
from tracking list, using ip_conntrack_find_get().

Then, according to the direction the packet is going, the connection
status is changed.

The "if" statement in resolve_normal_ct() function

if (DIRECTION(h) == IP_CT_DIR_REPLY)

determains which direction the packet is going.
Depending on this information, state of connection is defined.

DIRECTION macro is defined in ip_conntrack_tuple_hash.h

#define DIRECTION(h) \
((enum ip_conntrack_dir)(&(h)->ctrack->tuplehash[1] == (h)))

In the code, h represents pointer to ip_conntrack_tuple_hash
returned from ip_conntrack_find_get().

Because tuplehash[1] is tuplehash[IP_CT_DIR_REPLY], if h is for
IP_CT_DIR_ORIGINAL, DIRECTION return 0. On the other hand if it for
IP_CT_DIR_REPLY, return 1.

Here is a code:

if (DIRECTION(h) == IP_CT_DIR_REPLY) {
               *ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY;
                *set_reply = 1;
        } else {
                if (h->ctrack->status & IPS_SEEN_REPLY) {
                        *ctinfo = IP_CT_ESTABLISHED;
                } else if (h->ctrack->status & IPS_EXPECTED) {
                        *ctinfo = IP_CT_RELATED;
                } else {
                        *ctinfo = IP_CT_NEW;
                }
                *set_reply = 0;
        }
        skb->nfct = &h->ctrack->infos[*ctinfo];
        return h->ctrack;
}

If the packet is reply, ctinfo is set to IP_CT_ESTABLISHED + IP_CT_IS_REPLY.
This means that connection is established and packet is reply.
And set_reply is set to 1.
When code returns to ip_conntrack_in() and if set_reply is set,
status field of struct ip_conntrack is or-ed with
1 << IPS_SEEN_REPLY_BIT (It is already seen).

When the packet is original direction (else statement),
if ip_conntrack->status (h->ctrack->status in code) is set at
IPS_SEEN_REPLY (= 1 << IPS_SEEN_REPLY_BIT) bit,
ctinfo is set to IP_CT_ESTABLISHED.

And IPS_SEEN_REPLY bit is not set (this packet is not seen so far),
ctinfo is set to IP_CT_NEW. set_replay is 0.

(We ignore IPS_EXPECTED now for simplicity)

(In ordinary case, a new packet is come, state is IP_CT_NEW.
Next packet is come as reply, state is IP_CT_ESTABLISHED | IP_CT_REPLY.
And next is come as original direction, state is IP_CT_EXTABLISHED,
and so on.)

At last of this routine, nfct member of sk_buff is set to appropriate
point of element of nf_ct_info array.
This member is used to determine the state of connection.

If this packet is for local machine, NF_LOCAL_IN filter calls ip_confirm()
and if this packet is for forwarding to other interface, NF_POST_ROUTING
filter calls ip_refrag() which calls ip_confirm().

ip_confirm()

ip_confirm() is defined in ip_conntrack_standalone.c.
It only return ip_conntrack_confirm();
ip_conntrack_confirm() is defined in linux/netfilter_ipv4/ip_conntrack_core.h.

The code is

static inline int ip_conntrack_confirm(struct sk_buff *skb)
{
        if (skb->nfct
            && !is_confirmed((struct ip_conntrack *)skb->nfct->master))
                return __ip_conntrack_confirm(skb->nfct);
        return NF_ACCEPT;
}

As nfct of sk_buff is set correctly in ip_conntrack_in() (skb->nfct bas value)
and list link of tuplehash is not tweaked (which makes is_confirmed() return 0),
__ip_conntrack_confirm() is executed.

__ip_conntrack_confirm(), which is defined in ip_conntrack_core.c,
prepends tuplehash[] into hash lists.This make sure that when next packet
on this connection reaches, correct ip_conntrack structure is available.

The following depict show the image of "tuple" (not a packet).

          NF_CT_DIR_ORIGINAL

| | orig_src/orig_dst -> | | -> orig_src/orig_dst | system | | | orig_dst/orig_src <- | | <- orig_dst/orig_src

NF_CT_DIR_REPLY

At this point, NF_CT_DIR_ORIGINAL and NF_CT_DIR_REPLY use the same
source/destination address.

But, ifNAT filtering is done, source and destination address is not
the same between left side and right side.

When only tracking a connection, the situation is so simple.
NF_CT_DIR_REPLY tuple has the inversed tuple of NF_CT_DIR_ORIGINAL.



top
inserted by FC2 system