Gottlieb t Freitag

Epoll is not as bad as the internet says

September 14, 2019 - lutz

In all honesty the internet does not really say that much about epoll. It’s mostly a post from 2017 called epoll is fundamentally broken that annoys me. The blog post states that “if you can you should avoid using epoll for load balancing across threads”. I strongly disagree with that statement and with the overall methodology of the post.

When I’ve build simplyfile I started to really like epoll. I’ve used it in multiple projects as a load balancing solution. Also I know that epoll is used in a multitude of applications that are very well capable of utilizing it properly (e.g., nginx).

What is epoll?

Epoll is the sucessor to poll. poll is a POSIX specific system call that enables programs to monitor file descriptors for their readiness to perform IO. Think about implementing a server software that has to monitor multiple client connections. When any of the clients sends data to the server it needs to react to that data. Instead of busy-polling each client socket a single call to poll blocks until any of the client connections received some data and returns information which file descriptor became readable. The problem with poll is that it’s rather slow. Most of the time the same set of file descriptors is passed to poll and for every invocation a structure in the kernel has to be built and torn down. Also poll is not exactly an ideal solution for multithreaded applications as it is susceptible to the thundering herd problem. epoll was build to solve exactly those problems. An epoll file descriptor manages a set of file descriptors to be monitored and the epoll equivalent to poll (epoll_wait). Further epoll allows for edge triggered monitoring of the readiness of file descriptors. This edge sensitivity is the core strength of poll as it solves the thundering herd problem.

Thundering Herd

With the rise of multithreading the need for something different to poll arose. In a nutshell the “thundering herd problem” can be described like the following: If multiple threads monitor the same file descriptor using poll and that file descriptor becomes ready for IO then multiple calls return. There is no one event to one wakeup relation with poll. The programer is in charge to resolve those situations then. However, AFAIK there is no universally good solution to the thundering herd problem that minimizes wakeups apart from using edge-sensitive wakeups or one-shot events.

So, what’s the Problem then?

The blog post I mentioned earlier claims that epoll does exactly not solve the mentioned problems. This assumption is plain wrong. It showcases a usage of epoll and a “pessimistic run” of that usage with the intention to demonstrate epolls inability to resolve the thundering herd problem. However, the example is flawed and can easily be fixed. Further the posting actually indicates possible solutions to the problems it mentioned! Edge-triggered EPOLLONESHOT is indeed already a solution to the discussed problem.

To make this clear:
My critique to the original blog post is that it assumes that epoll is broken even though it is simply not used properly. The post is on the first page when googling for epoll and thus has some really big influence on programers looking for a scalable alternative to poll. I had the pleasure of working with multiple people who would read this as “epoll is a big no-no” and would refuse to touch any implementation that utilizes it. And therefore I would much rather like to see the original post to be updated or corrected. There actually was a time when the post’s criticism of epoll was valid. However, with the introduction of EPLLONESHOT (2004) the mentioned problems were already resolved. The post was released in 2017 though.

My Take on Epoll

Simplyfile comes with a Wrapper around epoll that is a rather straight forward event dispatcher for file descriptors. It’s more or less just a std::map from file descritpros (ints) to std::function<void(int)> objects. Also it comes with a work method where the call to epoll_wait and the dispatcher logic is implemented.

Here are the important bits and pieces (the entire code can be found here more examples and details here):

struct Epoll : FileDescriptor {
	using Callback = std::function<void(int)>;

	Epoll();
	Epoll(Epoll &&other);
	Epoll& operator=(Epoll &&rhs);
	~Epoll();

	void addFD(int fd, Callback&& callback, int epollFlags = EPOLLIN|EPOLLET, std::string const& name="");
	void modFD(int fd, int epollFlags = EPOLLIN|EPOLLET);
	void rmFD(int fd, bool blocking);

	void work(int maxEvents=1, int timeout_ms=-1) {
		dispatch(wait(maxEvents, timeout_ms));
	}

	// call epoll_wait internally and return the list of events
	std::vector<struct epoll_event> wait(int maxEvents=32, int timeout_ms=-1);

	// call all callbacks of the event list
	void dispatch(std::vector<struct epoll_event> const&);

	// wakes up count thread that is calling wait
	void wakeup(uint64_t count=1);
    [...]
};

The inner workings of Epoll should not be surprizing apart from code to measure execution times and some exception wrapper that captures all exceptions and then encapsules them within a more descriptive exception. As mentioned before: I use this class in multiple projects to actually do load balancing across threads. And it works exactly how I want it to work.

TL;DR

There are people on the internet claiming epoll to be very bad. Too complicated to use, slow and error prone. That’s just wrong. Epoll is neither of those things.