Items tagged with: elixir
HN Discussion: https://news.ycombinator.com/item?id=19714627
Posted by schneidmaster (karma: 1061)
Post stats: Points: 155 - Comments: 36 - 2019-04-21T21:15:05Z
#HackerNews #absinthe #and #apollo #elixir #graphql #phoenix #react
HackerNewsBot debug: Calculated post rank: 115 - Loop: 87 - Rank min: 100 - Author rank: 53
In this article, we explain how to scale Elixir to handle 100k connections per second.
Article word count: 1226
HN Discussion: https://news.ycombinator.com/item?id=19311750
Posted by slashdotdash (karma: 602)
Post stats: Points: 313 - Comments: 66 - 2019-03-05T16:40:19Z
#HackerNews #100k #achieving #connections #elixir #per #second #with
Both HTTP 1.x and HTTP/2 rely on lower level connection-oriented protocols, namely TCP/IP and TLS. These protocols provide reliable delivery and correct order when data is chunked into multiple packets. TLS also includes encryption and authentication.
The HTTP client needs to open a connection before it can send a request. For efficiency, each connection is likely to be reused for sending subsequent requests. To ensure this is the case while using HTTP 1.0, the client can set the connection: keep-alive header, while HTTP 1.1 and HTTP/2 keep connections alive by default.
Web browsers maintain “warm” connections for a few minutes, and server-to-server connections are usually kept alive much longer. The WebSocket protocol (built on top of HTTP) only allows for explicit closure of underlying HTTP connections.
What this means, performance-wise, is that measuring requests per second gets a lot more attention than connections per second. Usually, the latter can be one or two orders of magnitude lower than the former. Correspondingly, benchmarks use long-living connections to simulate multiple requests from the same device.
However, there are scenarios where this assumption does not hold true. For example, low-power IoT devices cannot afford to maintain active connections. A sensor could “wake up”, then establish a connection to a server, send a payload request, receive an acknowledgment response, close the connection, and go back to “sleep”. With such a workload, the connection-per-second and request-per-second rates would be the same.
In this article, we look at scaling Elixir to handle 100k connections per second.
The workload consists of 100k devices, each simply opening a TCP/IP connection, waiting for 1±10% seconds, and closing the connection. The test consists of three phases: a 15-minute rampup, 1 minute sustained load, and 15-minute rampdown. The following Stressgrid script represents the workload:
We tested against Ranch, which is the socket acceptor pool at the heart of the Cowboy web server. With Ranch, it is possible to implement a server for any TCP/IP- and TSL-based protocol, which makes our benchmark not specific to HTTP. We used Ubuntu 18.04 with the 4.15.0-1031-aws kernel, with sysctld overrides seen in our /etc/sysctl.d/10-dummy.conf. We used Erlang 21.2.6-1 on a 36-core c5.9xlarge instance.
To run this test, we used Stressgrid with twenty c5.xlarge generators.
In the first test, we used unmodified Ranch 1.7. The connection rate graph shows a clear breaking point at 70k connections per second. After this point, connection latency grows, causing the rate to peak at 82k connections per second.
[IMG]Another good way of oberving the bottleneck effect is the 90th percentile latency. At 14th minute mark, the latency jumps from single-digit milliseconds to 1 second.
[IMG]To understand this bottleneck, we need to take a quick dive into the architecture of Ranch.
Ranch maintains a pool of acceptors to enable more throughput when handling new connections. By default, Ranch starts with 10 acceptors. In this test, we set it to 36—the number of CPU cores on our c5.9xlarge. We also performed the same test with acceptors set to 4x and 16x number of CPU cores, with negligible differences.
This behavior makes it possible for our server to accept new connections with a high degree of parallelism. When a socket is accepted, it is passed to to a newly-started connection process. However, all connection processes are started by a single connection supervisor. This supervisor becomes our primary suspect in our search for the bottleneck.
To test this theory, we modified our test server to report the supervisor’s message queue length. Indeed, we observed between 20 and 200 messages in the 90th percentile, starting at 9th minute mark.
[IMG]The good news is that Ranch maintainers are well aware of this problem. There is a pull request with a proof of concept that introduces acceptor-supervisor pairs.
With acceptor-supervisor pairs, there should no longer be any points of contention in the path of creating new connections within Ranch. To verify this, we collected a similar report for the total message queue length for all 36 connection supervisors. The 90th percentile now stays below 3.
[IMG]But is it too early to celebrate—at 15th minute mark, 90th percentile latency jumps to 1 second again.
[IMG]The breaking point in the connection rate graph is less pronounced, but remains at about 70k connections per second.
We hit another bottleneck.
[IMG]To understand this bottleneck, we need to understand how TCP/IP is implemented inside the Linux kernel.
When the client wants to establish a new connection, it sends a SYN packet. When the server receives SYN, it places the new connection in SYN queue and reports it as being in a syn-recv state. The connection stays in syn-recv state until it is moved to the accept queue. When the userland program—in our case Ranch—invokes the accept() function, it removes the connection from the accept queue, and the connection becomes established. Cloudflare has a blog post with a detailed explanation of this mechanism.
We created a bash script that records the number of connections in syn-recv state during our test.
Around 11th minute, the number of connections quickly reaches 1k. With the maximum SYN queue length set to 1024 with the net.core.somaxconn sysctl parameter, overflowing SYN packets get dropped. The clients back off and re-send SYN packets later. This results in the 90th percentile latency we saw earlier.
[IMG]Modified Ranch is able to accept with higher degree of parallelism, but the accept() function gets invoked on a single shared listener socket. It turns out, the Linux kernel experiences contention when it comes to invoking accept() on the same listener socket.
In 2010, a group of engineers from Google discussed the issues with lock contention and suboptimal load balancing in a presentation, where they also estimate the maximum connections per second rate to be around 50k (on 2010 hardware). Furthermore, they proposed a Linux kernel patch that introduced the SO_REUSEPORT socket option, that makes it possible to open many listener sockets on the same port, causing the sockets to be load-balanced when accepting new connections.
Multi-supervisor Ranch with SO_REUSEPORT
To find out if the SO_REUSEPORT socket option would help, we created a proof of concept application in which we ran multiple Ranch listeners on the same port with SO_REUSEPORT set using the raw setops option. We set the number of listeners to be the same as the number of available CPUs.
With this proof of concept, we again measured the number of connections in syn-recv state during our test. This number never gets close to 1k, meaning we’re no longer dropping SYN messages.
[IMG]90th percentile latency confirms our findings: it remains consistently low throughout the test.
[IMG]When zooming in, 90th percentile latency measures between 1 and 2 milliseconds.
[IMG]We also observed much better CPU utilization, which resulted from less contention and fairer load balancing when accepting new connections.
[IMG]Finally, the connections per second rate reaches 99k, with network latency and available CPU resources contributing to the next bottleneck.
By analyzing the initial test results, proposing a theory, and confirming it by measuring against modified software, we were able to find two bottlenecks on the way to getting to 100k connections per second with Elixir and Ranch. The combination of multiple connection supervisors in Ranch and multiple listener sockets in the Linux kernel is necessary to achieve full utilization of the 36-core machine under the target workload.
Discussion on lobste.rs
Discussion on Hacker News
HackerNewsBot debug: Calculated post rank: 230 - Loop: 318 - Rank min: 100 - Author rank: 73
Según un grupo de expertos, se ha descubierto un legendario «Elixir de la inmortalidad», mencionado en los antiguos textos chinos, en una olla de bronce escondida en el interior de una tumba funeraria de 2.000 años.
#antigua-china #china #dinastia-han #elixir #inmortalidad
Publicado originalmente en: http://portalancestral.com/descubren-un-elixir-de-la-inmortalidad-por-primera-vez-en-china-en-una-tumba-de-2-000-anos/
From the beginning, Discord has been an early adopter of Elixir. The Erlang VM was the perfect candidate for the highly concurrent…
Article word count: 1821
HN Discussion: https://news.ycombinator.com/item?id=19238221
Posted by lelf (karma: 37746)
Post stats: Points: 128 - Comments: 20 - 2019-02-24T11:48:53Z
#HackerNews #2017 #concurrent #discord #elixir #scaled #users
Go to the profile of Stanislav Vishnevskiy
From the beginning, Discord has been an early adopter of Elixir. The Erlang VM was the perfect candidate for the highly concurrent, real-time system we were aiming to build. We developed the original prototype of Discord in Elixir; that became the foundation of our infrastructure today. Elixir’s promise was simple: access the power of the Erlang VM through a much more modern and user-friendly language and toolset.
Fast forward two years, and we are up to nearly five million concurrent users and millions of events per second flowing through the system. While we don’t have any regrets with our choice of infrastructure, we did have to do a lot of research and experimentation to get here. Elixir is a new ecosystem, and the Erlang ecosystem lacks information about using it in production (although Erlang in Anger is awesome). What follows is a set of lessons learned and libraries created throughout our journey of making Elixir work for Discord.
While Discord is rich with features, most of it boils down to pub/sub. Users connect to a WebSocket and spin up a session process (a GenServer), which then communicates with remote Erlang nodes that contain guild (internal for a “Discord Server”) processes (also GenServers). When anything is published in a guild, it is fanned out to every session connected to it.
When a user comes online, they connect to a guild, and the guild publishes a presence to all other connected sessions. Guilds have a lot of other logic behind the scenes, but here’s a simplified example:
This was a fine approach when we originally built Discord to groups of 25 of less. However, we have been fortunate enough to have “good problems” arise as people started using Discord for large scale groups. Eventually we ended up with many Discord servers like /r/Overwatch with up to 30,000 concurrent users. During peak hours, we began to see these processes fail to keep up with their message queues. At a certain point, we had to manually intervene and turn off features that generated messages to help cope with the load. We had to figure this out before it became a full-time job.
We began by benchmarking hot paths within the guild processes and quickly stumbled onto an obvious culprit. Sending messages between Erlang processes was not as cheap as we expected, and the reduction cost — Erlang unit of work used for process scheduling — was also quite high. We found that the wall clock time of a single send/2 call could range from 30μs to 70us due to Erlang de-scheduling the calling process. This meant that during peak hours, publishing an event from a large guild could take anywhere from 900ms to 2.1s! Erlang processes are effectively single threaded, and the only way to parallelize the work is to shard them. That would have been quite an undertaking, and we knew there had to be a better way.
We knew we had to somehow distribute the work of sending messages. Since spawning processes in Erlang is cheap, our first guess was to just spawn another process to handle each publish. However, each publish could be scheduled at a different time, and Discord clients depend on linearizability of events. That solution also wouldn’t scale well because the guild service was also responsible for an ever-growing amount of work.
Inspired by a blog post about boosting performance of message passing between nodes, Manifold was born. Manifold distributes the work of sending messages to the remote nodes of the PIDs (Erlang process identifier), which guarantees that the sending processes at most only calls send/2 equal to the number of involved remote nodes. Manifold does this by first grouping PIDs by their remote node and then sending to Manifold.Partitioner on each of those nodes. The partitioner then consistently hashes the PIDs using :erlang.phash2/2, groups them by number of cores, and sends them to child workers. Finally, those workers send the messages to the actual processes. This ensures the partitioner does not get overloaded and still provides the linearizability guaranteed by send/2. This solution was effectively a drop-in replacement for send/2:
An awesome side-effect of Manifold was that we were able to not only distribute the CPU cost of fanning out messages, but also reduce the network traffic between nodes:
Network Reduction on 1 Guild Node
Manifold is available on our GitHub, so give it a spin. https://github.com/discordapp/manifold.
Discord is a distributed system achieved through consistent hashing. Using this method requires us to create a ring data structure that can be used to lookup the node of a particular entity. We want that to be fast, so we chose the wonderful library by Chris Moos via a Erlang C port (process responsible for interfacing with C code). It worked great for us, but as Discord scaled, we started to notice issues when we had bursts of users reconnecting. The Erlang process responsible for controlling the ring would start to get so busy that it would fail to keep up with requests to the ring, and the whole system would become overloaded. The solution at first seemed obvious: run multiple processes with the ring data to better utilize all the machine’s cores to answer the requests. However, we noticed that this was a hot path. Could we do better?
Let’s break down the cost of this hot path.
* A user can be in any number of guilds, but an average user is in 5. * An Erlang VM responsible for sessions can have up to 500,000 live sessions on it. * When a session connects, it has to lookup the remote node for each guild it is interested in. * The cost of communicating with another Erlang process using request/reply is about 12μs.
If the session server were to crash and restart, it would take about 30 seconds just for the cost of lookups on the ring. That does not even account for Erlang de-scheduling the single process involved in the ring for other processes’ work. Could we remove this cost completely?
The first thing people do in Elixir when they want to speed up data access is to introduce ETS. ETS is a fast, mutable dictionary implemented in C; the tradeoff is that data is copied in and out of it. We couldn’t just move our ring into ETS because we were using a C port to control the ring, so we converted the code to pure Elixir. Once that was implemented, we had a process whose job was to own the ring and constantly copy it into ETS so other processes could read directly from ETS. This noticeably improved performance, but ETS reads were about 7μs, and we were still spending 17.5 seconds on looking up values in the ring. The ring data structure is actually fairly large, and copying it in and out of ETS was the majority of the cost. We were disappointed; in any other language we could easily just have a shared value that was safe to read. There had to be a way to do this in Erlang!
After doing some research, we found mochiglobal, a module that exploits a feature of the VM: if Erlang sees a function that always returns the same constant data, it puts that data into a read-only shared heap that processes can access without copying the data. mochiglobal takes advantage of this by creating an Erlang module with one function at runtime and compiling it. Since the data is never copied, the lookup cost decreases to 0.3us, bringing the total time down to 750ms! There’s no such thing as a free lunch though; the cost of building a module with a data structure as large as the ring at runtime can take up to a second. The good news is that we rarely change the ring, so it was a penalty we were willing to take.
We decided to port mochiglobal to Elixir and add some functionality to avoid creating atoms. Our version is called FastGlobal and is available at https://github.com/discordapp/fastglobal.
After solving the performance of the node lookup hot path, we noticed that the processes responsible for handling guild_pid lookup on the guild nodes were getting backed up. The inherent back pressure of the slow node lookup had previously protected these processes. The new problem was that nearly 5,000,000 session processes were trying to stampede ten of these processes (one on each guild node). Making this path faster wouldn’t solve the problem; the underlying issue was that the call of a session process to this guild registry would timeout and leave the request in the queue of the guild registry. It would then retry the request after a backoff, but perpetually pile up requests and get into an unrecoverable state. Sessions would block on these requests until they timed out while receiving messages from other services, causing them to balloon their message queues and eventually OOM the whole Erlang VM resulting in cascading service outages.
We needed to make session processes smarter; ideally, they wouldn’t even try to make these calls to the guild registry if a failure was inevitable. We didn’t want to use a circuit breaker because we didn’t want a burst in timeouts to result in a temporary state where no attempts are made at all. We knew how we would solve this in other languages, but how would we solve it in Elixir?
In most other languages, we could use an atomic counter to track outstanding requests and bail early if the number was too high, effectively implementing a semaphore. The Erlang VM is built around coordinating through communication between processes, but we knew we didn’t want to overload a process responsible for doing this coordination. After some research we stumbled upon :ets.update_counter/4, which performs atomic conditional increment operations on a number inside an ETS key. Since we needed high concurrency, we could also run ETS in write_concurrency mode but still read the value out, since :ets.update_counter/4 returns the result. This gave us the fundamental piece to create our Semaphore library. It is extremely easy to use and performs really well at high throughput:
This library has proved instrumental in protecting our Elixir infrastructure. A similar situation to the aforementioned cascading outages occurred as recently as last week, but there were no outages this time. Our presence services crashed due to an unrelated issue, but the session services did not even budge, and the presence services were able to rebuild within minutes after restarting:
Live presences within presence serviceCPU usage on the session services around the same time period.
You can find our Semaphore library on GitHub at https://github.com/discordapp/semaphore.
Choosing to use and getting familiar with Erlang and Elixir has proven to be a great experience. If we had to go back and start over, we would definitely choose the same path. We hope that sharing our experiences and tools proves useful to other Elixir and Erlang developers, and we hope to continue sharing as we progress on our journey, solving problems and learning lessons along the way.
We are hiring, so come join us if this type of stuff tickles your fancy.
HackerNewsBot debug: Calculated post rank: 92 - Loop: 95 - Rank min: 80 - Author rank: 131
We're looking for a #developer familiar with #Elixir (or transferable skills like #Erlang and #RubyOnRails).
The whole project will be #FOSS and you can work from home, what's not to love 😉 Please RT!
#tech #job #programming #jobs #software #remote #federation #decentralised