guest - flak
Flaakle

bind broker

You’ve got a great big server that’s capable of supporting multiple users. Everybody wants to run a web server. This would be great, but alas, archaic decisions made long ago mean that network sockets aren’t really files and there’s this weird concept of privileged ports. Maybe we could assign each user a virtual machine and let them do whatever they want, but that seems wasteful. Think of the megabytes! Maybe we could setup nginx.conf to proxy all incoming connections to a process of the user’s choosing, but that only works for web sites and we want to be protocol neutral. Maybe we could use iptables, but nobody wants to do that.

What we need is a bind broker. At some level, there needs to be some kind of broker that assigns IPs to users and resolves conflicts. It should be possible to build something of this nature given just the existing unix tools we have, instead of changing system design. Then we can deploy our broker to existing systems without upgrading or disrupting their ongoing operation. The bind broker watches a directory for the creation, by users, of unix domain sockets. Then it binds to the TCP port of the same name, and transfers traffic between them.

I postulated that this task is actually so simple that it could be an intro to unix programming class homework assignment. Of course, it would be unfair of me to assign homework without first developing an answer key. I was figuring this would come in somewhere around 200 lines of code.

problem

A more complete problem specification is as follows. A top level directory, which contains subdirectories named after IP addresses. Each user is assigned a subdirectory, which they have write permission to. Inside each subdirectory, the user may create unix sockets named according to the port they wish to bind to. We might assign user alice the IP 10.0.0.5 and the user bob the IP 10.0.0.10. Then alice could run a webserver by binding to net/10.0.0.5/80 and bob could run a mail server by binding to net/10.0.0.10/25. This maps IP ownership (which doesn’t really exist in unix) to the filesystem namespace (which does have working permissions).

The broker is responsible for watching each directory. As new sockets are created, it should respond by binding to the appropriate port. When a socket is deleted, the network side socket should be closed as well. Whenever a connection is accepted on the network side, a matching connection is made on the unix side, and then traffic is copied across.

The program should be run as root. One can test by creating a directory 127.0.0.1 writable by a regular user. Then running nc -lU 83 in one shell and nc localhost 83 in another should allow passing traffic.

solution

Let’s begin with the directory scanning code, then we’ll work our way up to the event loop. Some code to scan a directory for subdirectories, and then scan those for ports.

int
scanhostdir(const char *dirname)
{
        DIR *dir;
        struct dirent *ent;

        dir = opendir(dirname);
        chdir(dirname);
        while ((ent = readdir(dir))) {
                if (ent->d_name[0] == '.')
                        continue;
                if (ent->d_type == DT_DIR) {
                        scanportdir(ent->d_name);
                        watchdir(ent->d_name);
                }
        }
        closedir(dir);
        return 0;
}

int
scanportdir(const char *dirname)
{
        DIR *dir;
        struct dirent *ent;
        TAILQ_HEAD(, mapping) existing;
        struct mapping *map, *next;

        TAILQ_INIT(&existing);
        TAILQ_FOREACH_SAFE(map, &maps, next, next) {
                if (strcmp(map->ipname, dirname) == 0) {
                        TAILQ_REMOVE(&maps, map, next);
                        TAILQ_INSERT_TAIL(&existing, map, next);
                }
        }

        dir = opendir(dirname);
        while ((ent = readdir(dir))) {
                if (ent->d_type == DT_SOCK) {
                        TAILQ_FOREACH_SAFE(map, &existing, next, next) {
                                if (strcmp(map->portname, ent->d_name) == 0) {
                                        TAILQ_REMOVE(&existing, map, next);
                                        TAILQ_INSERT_TAIL(&maps, map, next);
                                        break;
                                }
                        }
                        if (!map) {
                                bindport(dirname, ent->d_name);
                        }
                }
        }
        closedir(dir);

        TAILQ_FOREACH_SAFE(map, &existing, next, next) {
                closemapping(map);
        }
        return 0;
}

We use linked lists because they are the official systems programming data structure of choice. Something with less quadratic behavior would also be a good choice.

Now that we’ve identified ports of interest, the code to bind the outside network port.

int
bindport(const char *ipname, const char *portname)
{
        struct mapping *map;
        union sockaddress addr;
        struct kevent kev;
        int one = 1;
        int s;

        memset(&addr, 0, sizeof(addr));
        addr.i.sin_family = AF_INET;
        addr.i.sin_port = htons(strtonum(portname, 1, 1024, NULL));
        inet_aton(ipname, &addr.i.sin_addr);

        s = socket(AF_INET, SOCK_STREAM, 0);
        setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
        bind(s, &addr.a, sizeof(addr.i));
        listen(s, 10);

        map = malloc(sizeof(*map));
        map->ipname = strdup(ipname);
        map->portname = strdup(portname);
        map->listeningfd = s;
        TAILQ_INSERT_TAIL(&maps, map, next);

        EV_SET(&kev, s, EVFILT_READ, EV_ADD, 0, 0, map);
        kevent(kq, &kev, 1, NULL, 0, NULL);

        return 0;
}

Having bound the socket, we want to set it up for events so we know when to call accept. Using kevent here, but select or poll would be more portable interfaces.

Having finished our scan, we return to the event loop.

int
eventloop()
{
        struct kevent kev;

        while (1) {
                kevent(kq, NULL, 0, &kev, 1, NULL);
                switch (kev.filter) {
                case EVFILT_VNODE:
                        {
                                struct watcheddir *dir = kev.udata;
                                scanportdir(dir->name);
                        }
                        break;
                case EVFILT_READ:
                        {
                                struct mapping *map = kev.udata;
                                if (map) {
                                        newconnection(map);
                                } else {
                                        close(kev.ident);
                                }
                        }
                        break;
                }
        }
        return 0;
}

Oh, hey, something about vnodes. There’s no completely portable way to watch a directory for changes. I’m using a kevent extension. Otherwise we might consider a timeout and polling with fstat, or another system specific interface (or an abstraction layer over such an interface). Otherwise, if one of our mappings is ready to read (accept), we have a new connection to handle.

int
newconnection(struct mapping *map)
{
        int s, client;
        union sockaddress addr;
        struct kevent kev;

        client = accept(map->listeningfd, NULL, 0);

        memset(&addr, 0, sizeof(addr));
        addr.u.sun_family = AF_UNIX;
        snprintf(addr.u.sun_path, sizeof(addr.u.sun_path), "%s/%s",
            map->ipname, map->portname);
        s = socket(AF_UNIX, SOCK_STREAM, 0);
        connect(s, &addr.a, sizeof(addr.u));

        setsockopt(client, SOL_SOCKET, SO_SPLICE, &s, sizeof(s));
        setsockopt(s, SOL_SOCKET, SO_SPLICE, &client, sizeof(client));

        EV_SET(&kev, client, EVFILT_READ, EV_ADD, 0, 0, NULL);
        kevent(kq, &kev, 1, NULL, 0, NULL);
        EV_SET(&kev, s, EVFILT_READ, EV_ADD, 0, 0, NULL);
        kevent(kq, &kev, 1, NULL, 0, NULL);

        return 0;
}

The first half is straightforward. We accept the connection and make a matching connect call to the unix side. Then I broke out the big cheat stick and just spliced the sockets together. In reality, we’d have to set up a read/copy/write loop for each end to copy traffic between them. That’s not very interesting to read though.

And that’s the basis for our solution.

results

The full code, below, comes in at 232 lines according to wc. Minus includes, blank lines, and lines consisting of nothing but braces, it’s 148 lines of stuff that acutally gets executed by the computer. Add some error handling, and working read/write code, and 200 lines seems about right.

The code as given isn’t portable, but the system interfaces used (kevent and splicing) can be replaced with generic posix interfaces.

The splicing code doesn’t even work on OpenBSD, because it’s not (yet) possible to splice sockets from different domains. I wouldn’t want the answer key to be so easy to steal. Bonus points if you fix the kernel instead.

I wrote all the directory scanning code first, then the directory watching code, so there’s some obvious inefficiencies. The list of mapped ports could be tied to a directory, which would spare scanning the whole list.

code

#include <sys/queue.h>
#include <sys/types.h>
#include <sys/event.h>
#include <netinet/in.h>
#include <sys/un.h>
#include <dirent.h>

#include <sys/socket.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

union sockaddress {
        struct sockaddr a;
        struct sockaddr_storage s;
        struct sockaddr_in i;
        struct sockaddr_un u;
};

struct mapping {
        TAILQ_ENTRY(mapping) next;
        char *ipname;
        char *portname;
        int listeningfd;
};
TAILQ_HEAD(, mapping) maps;

void
closemapping(struct mapping *map)
{
        close(map->listeningfd);
        free(map->ipname);
        free(map->portname);
        free(map);
}

int kq;

int
bindport(const char *ipname, const char *portname)
{
        struct mapping *map;
        union sockaddress addr;
        struct kevent kev;
        int one = 1;
        int s;

        memset(&addr, 0, sizeof(addr));
        addr.i.sin_family = AF_INET;
        addr.i.sin_port = htons(strtonum(portname, 1, 1024, NULL));
        inet_aton(ipname, &addr.i.sin_addr);

        s = socket(AF_INET, SOCK_STREAM, 0);
        setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
        bind(s, &addr.a, sizeof(addr.i));
        listen(s, 10);

        map = malloc(sizeof(*map));
        map->ipname = strdup(ipname);
        map->portname = strdup(portname);
        map->listeningfd = s;
        TAILQ_INSERT_TAIL(&maps, map, next);

        EV_SET(&kev, s, EVFILT_READ, EV_ADD, 0, 0, map);
        kevent(kq, &kev, 1, NULL, 0, NULL);
 
        return 0;
}

int
scanportdir(const char *dirname)
{
        DIR *dir;
        struct dirent *ent;
        TAILQ_HEAD(, mapping) existing;
        struct mapping *map, *next;

        TAILQ_INIT(&existing);
        TAILQ_FOREACH_SAFE(map, &maps, next, next) {
                if (strcmp(map->ipname, dirname) == 0) {
                        TAILQ_REMOVE(&maps, map, next);
                        TAILQ_INSERT_TAIL(&existing, map, next);
                }
        }

        dir = opendir(dirname);
        while ((ent = readdir(dir))) {
                if (ent->d_type == DT_SOCK) {
                        TAILQ_FOREACH_SAFE(map, &existing, next, next) { 
                                if (strcmp(map->portname, ent->d_name) == 0) {
                                        TAILQ_REMOVE(&existing, map, next);
                                        TAILQ_INSERT_TAIL(&maps, map, next);
                                        break;
                                }
                        }
                        if (!map) {
                                bindport(dirname, ent->d_name);
                        }
                }
        }
        closedir(dir);

        TAILQ_FOREACH_SAFE(map, &existing, next, next) {
                closemapping(map);
        }
        return 0;
}

struct watcheddir {
        TAILQ_ENTRY(watcheddir) next;
        char *name;
        int fd;
};
TAILQ_HEAD(, watcheddir) directories;

int
watchdir(const char *dirname)
{
        struct watcheddir *dir;
        struct kevent kev;
        int fd;

        TAILQ_FOREACH(dir, &directories, next) {
                if (strcmp(dirname, dir->name) == 0)
                        return 0;
        }
        fd = open(dirname, O_RDONLY);
        dir = malloc(sizeof(*dir));
        dir->name = strdup(dirname);
        dir->fd = fd;
        TAILQ_INSERT_TAIL(&directories, dir, next);

        EV_SET(&kev, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_ATTRIB, 0, dir);
        kevent(kq, &kev, 1, NULL, 0, NULL); 
                 
        return 0;
}

int
scanhostdir(const char *dirname)
{
        DIR *dir;
        struct dirent *ent;
                                 
        dir = opendir(dirname);  
        chdir(dirname);          
        while ((ent = readdir(dir))) {
                if (ent->d_name[0] == '.')
                        continue; 
                if (ent->d_type == DT_DIR) {
                        scanportdir(ent->d_name);
                        watchdir(ent->d_name);
                }
        }
        closedir(dir);
        return 0;
}        

int
newconnection(struct mapping *map)
{
        int s, client;
        union sockaddress addr;
        struct kevent kev;

        client = accept(map->listeningfd, NULL, 0);

        memset(&addr, 0, sizeof(addr));
        addr.u.sun_family = AF_UNIX;
        snprintf(addr.u.sun_path, sizeof(addr.u.sun_path), "%s/%s",
            map->ipname, map->portname); 
        s = socket(AF_UNIX, SOCK_STREAM, 0);
        connect(s, &addr.a, sizeof(addr.u));

        setsockopt(client, SOL_SOCKET, SO_SPLICE, &s, sizeof(s));
        setsockopt(s, SOL_SOCKET, SO_SPLICE, &client, sizeof(client));

        EV_SET(&kev, client, EVFILT_READ, EV_ADD, 0, 0, NULL);
        kevent(kq, &kev, 1, NULL, 0, NULL);
        EV_SET(&kev, s, EVFILT_READ, EV_ADD, 0, 0, NULL);
        kevent(kq, &kev, 1, NULL, 0, NULL);

        return 0;
}

int
eventloop()
{
        struct kevent kev;

        while (1) {
                kevent(kq, NULL, 0, &kev, 1, NULL);
                switch (kev.filter) {
                case EVFILT_VNODE:
                        {
                                struct watcheddir *dir = kev.udata;
                                scanportdir(dir->name);
                        }
                        break;
                case EVFILT_READ:
                        {
                                struct mapping *map = kev.udata;
                                if (map) {
                                        newconnection(map);
                                } else {
                                        close(kev.ident);
                                }
                        }
                        break;
                }
        }
        return 0;
}

int
main(int argc, char **argv)
{
        const char *topdir = "topdir";

        kq = kqueue();

        TAILQ_INIT(&maps);
        TAILQ_INIT(&directories); 
            
        scanhostdir(topdir);
        eventloop();

        return 0;
}

A+.

Posted 2017-07-11 13:06:11 by tedu Updated: 2017-07-11 13:06:11
Tagged: c openbsd programming