NAT Network Address Translation


NAT Network Address Translation

NAT filtering is set by iptables command.

When the system initializes the iptable (iptable targets),
the filtering do nothing, only return NF_ACCEPT which means
all packet is accepted.

Almost of all targets of iptables are checked at points:


Functions that should be invoked is registered with nf_register_hook()
as well as another filtering functions.

Arguments that are passed to nf_register_hook() are defined in

${linux src}/net/ipv4/netfilter_ipv4/iptable_filters.c.

Here is a code:

static struct nf_hook_ops ipt_ops[]
= { { { NULL, NULL }, ipt_hook, PF_INET, NF_IP_LOCAL_IN, NF_IP_PRI_FILTER },
    { { NULL, NULL }, ipt_local_out_hook, PF_INET, NF_IP_LOCAL_OUT,
                NF_IP_PRI_FILTER }

The core function of these functions is ipt_do_table().
ipt_hook() and ip_local_out_hook() call ipt_do_table() function
that is defined in


ipt_do_table() iterates each taget from series of targets that is set
by iptables command and checks whether a packet matches the conditon
for the target.

ButNAT filtering must be handled at NF_IP_PRE_ROUTING, NF_IP_POST_ROUTING,
ipt_hook() function is not invoked at any point of these.

What's wrong?
Nothing is wrong.
ipt_do_table() is called in ip_nat_fn() through ip_nat_rule_find()
if the connection is new one.

ip_nat_fn() is a core function invoked at three points at which
NAT filtering should be done.

Arguments passed to nf_register_hook() is defined in

${linux src}/net/ipv4/netfilter/ip_nat_standalone.c

static struct nf_hook_ops ip_nat_in_ops = { { NULL, NULL }, ip_nat_fn, PF_INET, NF_IP_PRE_ROUTING, NF_IP_PRI_NAT_DST }; /* After packet filtering, change source */ static struct nf_hook_ops ip_nat_out_ops = { { NULL, NULL }, ip_nat_out, PF_INET, NF_IP_POST_ROUTING, NF_IP_PRI_NAT_SRC}; /* Before packet filtering, change destination */ static struct nf_hook_ops ip_nat_local_out_ops = { { NULL, NULL }, ip_nat_local_fn, PF_INET, NF_IP_LOCAL_OUT, NF_IP_PRI_NAT_DST };

ip_nat_local_fn() has several checking code before and after ip_nat_fn().
At all point, ip_nat_fn() is invoked so that ipt_do_iptables() invoked.

If iptables command setsNAT (either SNAT or DNAT, or both),
its target is checked and if nessesary, network address translation NAT)
should be done.

Look atNAT.

If target is for SNAT or DNAT, ipt_snat_target() or ipt_dnat_target()
are to be executed.
Both functions are defind in

${linux src}/net/ipv4/netfilter/ipt_nat_rule.c

static unsigned int ipt_snat_target(Arguments ....) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo;


ct = ip_conntrack_get(*pskb, &ctinfo);

IP_NF_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED)); IP_NF_ASSERT(out);

return ip_nat_setup_info(ct, targinfo, hooknum); }


static unsigned int ipt_dnat_target(Arguments ...)
        struct ip_conntrack *ct;
        enum ip_conntrack_info ctinfo;

IP_NF_ASSERT(hooknum == NF_IP_PRE_ROUTING || hooknum == NF_IP_LOCAL_OUT);

ct = ip_conntrack_get(*pskb, &ctinfo);

IP_NF_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED));

return ip_nat_setup_info(ct, targinfo, hooknum); }

Code is short and simple.
IP_NF_ASSERT macro make sure that SNAT must be occured at NF_IP_POST_ROUTING
and connection is tracked (ct != NULL) and new connection (ctinfo is IP_CT_NEW)
(Here, we ignore IP_CT_RELATED)

IP_NF_ASSERT macro is disappear when debug is disabled.

For DNAT it must be occured at NF_IP_PRE_ROUTING or NF_IP_LOCAL_OUT.
Others are same with SNAT.

Both function calls ip_nat_setup_info(), which makes nat table in
struct ct_nf_nat that is contained ip_conntrack.

struct ct_nf_nat is named info and this is an array of 2*3.

#define IP_NAT_MAX_MANIPS (2*3)

This is defined in ${linux src}/include/linux/netfilter_ipv4/ip_nat.h

Why 2*3?

Go ahead.

ip_nat_setup_info() is a main procedure forNAT filter.
Let's look at ip_nat_setup_info().

unsigned int
ip_nat_setup_info(struct ip_conntrack *conntrack,
                  const struct ip_nat_multi_range *mr,
                  unsigned int hooknum)

These four tuples is used to make a new tuple for translation
and ip_nat_info is a translation table.

        struct ip_conntrack_tuple new_tuple, inv_tuple, reply;
        struct ip_conntrack_tuple orig_tp;
        struct ip_nat_info *info = &conntrack->;

It locks ip_nat_lock and IP_NF_ASSERT makes sure that translation
(if debuged).

        IP_NF_ASSERT(hooknum == NF_IP_PRE_ROUTING
                     || hooknum == NF_IP_POST_ROUTING
                     || hooknum == NF_IP_LOCAL_OUT);

At most, translation is occured three times.
This is related get_unique_tuple(), see it.

        IP_NF_ASSERT(info->num_manips < IP_NAT_MAX_MANIPS);

There is a check how many translations is occured.

NAT translation is done by getting new tuple which is unique in the system
in order to distinguish from another translation.
And once a new tuple is gotton, the new tuple is set into


in the ip_conntrack_alter_reply() function, which is the check function
for do {...} while() loop for getting an unique tuple using

This loop is entered after the next code.


This code is confusing us.

NAT filtering is using reply tuple (not reply packet).
Each time translation is occured,

is replaced in ipconntrack_alter_reply(), which is the check function
reffered above.

Change of addresses whileNAT filtering is done using
tuplehash[IP_CT_DIR_REPLY].tuple only.

tuplehash[IP_CT_DIR_ORIGINAL] is used to calculate the hash value for
hash list named bysource.

WhenNAT translation is already occured, tuplehash[IP_CT_DIR_REPLY].tuple
is replaced and system sends and recieves packet as if connection is made
to the address extracted from tuplehash[IP_CT_DIR_REPLY].tuple.

And system has to change this address to the original address
when the packet has come in opposite direction.

So, once translation is occured, there is an possibility that

 orig_tp = conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple

is not valid. Because this tuple is made from a packet before translation.

The original tuple is inverted from IP_CT_DIR_REPLY tuple.


The tuple represented by orig_tp is now a tuple of translated address.

The most complex function get_unique_tuple() does his/her best for
getting an unique tuple that is within the range specified by mr argument.
And the tuple returned by the get_unique_tuple() is checked by
do { } while() loop.

The loop judgement is return value of ip_conntrack_alter_reply().
ip_conntrack_alter_reply() is defined in ip_conntrack_core.c.
This funciton is check whether the new reply tuple already exists or not.
And more important point, this function changes


to the new one. This operation tells that from then on, the connection
seems to established between srouce and destination represented by the
new tuple.

And code fallowing this do {...} while() loop, set ip_nat_info structure
in the ip_conntrack structure.
This is aNAT translation table.

After translation is done in ip_nat_setup_info(), proper translation is
done by do_bindings() at appropriate point.
do_bindings() function remakes a packet source and destination according
to these translation table.
do_bindings() is defined in

${linux src}/net/ipv4/netfilter/ip_nat_standalone.c

and invoked at the last of ip_nat_fn().
This function is invoked at NF_IP_POST_ROUTING, NT_IP_PRE_ROUTING, and
NF_IP_LOCAL_OUT, So, translation will be done automatically.

In the loop, once get a new tuple, inverted tuple is also made.

        do {
                if (!get_unique_tuple(&new_tuple, &orig_tp, mr, conntrack,
                                      hooknum)) {
                        return NF_DROP;

invert_tuplepr(&reply, &new_tuple);

} while (!ip_conntrack_alter_reply(conntrack, &reply));

Then, reversed tuple of orig_tp is stored in inv_tuple.
inv_tuple represents reply from translated address.

        invert_tuplepr(&inv_tuple, &orig_tp);

When we reach here, we have four tuple.

Setting of translation table

The next "if" statement means source address and port is not equeal.
Source address and port were changed.

If source address is changed, "out going" packet (not tuple) source address
should be replaced.

        if (!ip_ct_tuple_src_equal(&new_tuple, &orig_tp)) {
                /* In this direction, a source manip. */
                info->manips[info->num_manips++] =
                        ((struct ip_nat_info_manip)
                         { IP_CT_DIR_ORIGINAL, hooknum,
                           IP_NAT_MANIP_SRC, new_tuple.src });

In this code, IP_CT_DIR_ORIGINAL should be considered in the same way
as connection tracking.
This packet (not tuple) is "out going" packet.

do_bindings(), which acctually doesNAT translation according to
translation table, gets direction of packet from ip_conntrack structure
using macro


ctinfo in the macro is ctinfo in sk_buff structure, which is set in
ip_conntrack_in() to represent the status of packet.
Here, IP_CT_DIR_ORIGINAL means a packet (not tuple) is "out going".

Source address change is occured in both SNAT and DNAT translation.
(source address translation in DNAT is occured when destination ips and ports
are exhausted. This is the reason IPNAT_MAX_MANIP = 2*3. see get_unique_tuple())

After all, ct_nf_info that is set above means that

at specifiled point (hooknum),
for "out going" packet (IP_CT_DIR_ORIGINAL),
source address (IPNAT_MANIP_SRC)

is replaced to new_tuple.src.

          src     |        |  new src
              ->  | system |          ->
          dst     |        |      dst

^ | source translation

This check is always should be done (if debuged).

                IP_NF_ASSERT(info->num_manips < IP_NAT_MAX_MANIPS);

Next code specifies that opposite operation should be done at
opposite point.
As we changed source address for "out going" packet, we should put back
the original address at opposite point.

opposite_hook is defined in ip_nat_core.c as following:

static unsigned int opposite_hook[NF_IP_NUMHOOKS]

If source address change is occured at NF_IP_POST_ROUTING, we should
put back it at NF_IP_PRE_ROUGING ("in coming" packet) on this connection.

          src     |        |  new src
              <-  | system |          <-
          dst     |        |      dst

^ | destination translation

Above image is depict of destination translation of oppsite point.

                /* In the reverse direction, a destination manip. */
                info->manips[info->num_manips++] =
                        ((struct ip_nat_info_manip)
                         { IP_CT_DIR_REPLY, opposite_hook[hooknum],
                           IP_NAT_MANIP_DST, orig_tp.src });
                IP_NF_ASSERT(info->num_manips <= IP_NAT_MAX_MANIPS);

This code means that

at opposit point,
(this time at IP_NF_PRE_ROUTING = opposite_hook[NF_IP_NF_POST_ROUTING),
for "in coming" packet (not tuple) (IP_CT_DIR_REPLY),
destination address

should be changed to orig_tp.src.

At this time, ifNAT is not occured, orig_tp.src is really original source
address. IfNAT is occured, orig_tp.src is already translated source address
and it should be replaced to real original source address according to
anotherNAT table.

These information is stored in ip_nat_info structure array.
In this code


is the one.

The rest code of ip_nat_setup_info() do the same operation for
destination address change.

TheNAT translation is occured as following:

at NF_IP_PRE_ROUTING             for NF_CT_DIR_REPLY    with DST

for SNAT,

at NF_IP_POST_ROUTING            for NF_CT_DIR_REPLY    with SRC

for DNAT and if address/port for DNAT is exhausted,

at NF_IP_POST_ROUTING            for NF_CT_DIR_REPLY    with DST

Translation is distinguished uniquely.
According to the above translation data, do_bindings() does the translation
when both filtering point and direction of packet (not tuple) is matched.


The global point of view, the tuple for both direction (not for a packet)
is following. The system doingNAT translation translates address.


| | orig_src/orig_dst -> | | -> new_src/new_dst | system | | | orig_dst/orig_src <- | | <- new_dst/new_src


But from the system of view, translated tuple is store in NF_CT_DIR_REPLY
and it only see the "in coming" tuple (for NF_CT_DIR_REPLY).
This is done in ip_conntrack_alter_reply():

        conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply;

while loop for getting an unique tuple for translation.
(*newreply is a new tuple for translation)
And rehashes the hash for connection track using tuple for both direction
(NF_CT_DIR_ORIGINAL and NF_CT_DIR_REPLY) and reset into "bysource"
hash list using place_in_hash() or replace_in_hash().

AndNAT filter does not look at NF_CT_DIR_ORIGINAL tuple.
(It onky use it for making a hash value.)
So, the system see the tuple as following.


| | new_src/new_dst -> | | -> new_src/new_dst | system | | | new_dst/new_src <- | | <- new_dst/new_src



inserted by FC2 system