The Tail At Scale
Blog Post by: Tong Chen, Kaitlyn Khan, Hamza Rashid, Tom Reingold
In massive-scale systems that might be spread over a wide geographical area, response time for the user can be high. The statistical distribution of the latency is a long tail, i.e. as response time increases, the number of users experiencing a given response time decreases. This sounds OK, but the tail is long which is to say that a great many users can experience excessive response time. This paper discusses ways to mitigate long response time in ways that would not be necessary in smaller systems. (TR)
Large companies such as Google divide large problems among many computers and networks, and these might even be continents apart. An example of a service would be auto-completion in the Google search bar. I might type “how do I” and the first three suggestions are
- how can i tell what folder a google doc is in
- how do i watch the super bowl
- how do i get wordle (TR)
The problem of guessing what the user wants is divided among many systems, and the results are merged. (TR)
The three broad approaches to reducing latency are:
- overprovisioning of resources (e.g. using more services per request than the average case calls for),
- careful engineering of software,
- and improving reliability of systems and networks
Each of these sounds simple in principle but is highly complex. (TR)
One technique the paper discusses is what the authors call hedged requests, analogous to hedged bets. A request is sent to more than one server that can service the request with the hope that at least one will respond quickly enough. (TR)
It is important to know about latency in today’s world because of our reliance on technology. We want fast results and accurate ones. It is important to have algorithms that can reduce wait times. (HR)
How can we determine if the quality of an information retrieval system lives up to the task it’s performing? As mentioned in the article, “good enough” results may be the compromise for good latency. What does this have to do with our project? On our webpage, we plan to have a search bar that allows users to search through the list of radio stations we have available to connect to the radio. Considering the amount of options to choose from, users can narrow their search by genre or even a radio station name! How would they achieve this? String matching is the answer. But now we need to discuss the importance of good enough. For example, when users begin typing, do we allow search predictions to appear based on the strings we are trying to match? Or do we allow the search algorithm to only export options that match the beginning of the query string rather than any matches that are identified with the string? This article highlights new problems that we haven’t thought of yet for this project. (KK)
In the example the article went over, google search engines updates query results to predict and show results within a fraction of a second. The search engine will find matches that begin with or even have the string of letters that match with your input. Similar to this concept, string matching uses an algorithm to represent the array being searched and takes a pattern to represent the piece of the array being searched for. (KK)
String matching exists to provide the user a convenient way to find information that will aid them in their request for information. A fast working algorithm equates to a fast running program, to determine this you can calculate its running time. To do so, we need to think of preprocessing time and matching time. We also need to consider best case and worst case scenarios to develop the best running algorithm for our users. However it is unlikely that a user could detect an improvement on a search system that is already “good enough”. We are looking into this in hopes to produce a quality search option for our webpage. (KK)
Theoretically, load balancing can be applied to our radio. A large amount of requests on one particular stream can lead to long wait times for users. One way to mitigate this may be to add additional servers. Say our radio has two servers. When there is a large group of people trying to play one station we can divide the requests among two servers using a queue. When a new request is received it routes to the server with the smallest queue. (HR)
Another way may be to have the two servers both get the request to play a stream. Whichever server is able to play the stream first is given the request and the other server's request gets canceled. (HR)
Latency induced probation can also be applied to the radio streams. For example if server one is overloaded and begins to perform slowly, it will be excluded from the available servers temporarily. Server two will handle all new requests during that time. (HR)
The problem of having latency on the radio can be very annoying and problematic. One way to deal with the problem locally is to add a buffering system of some sort. So instead of paying the stream instantly when it comes in. The radio instead will store the data temporarily or permanently depending on the situation and config. Then play the data from a queue. The con side of things would be the radio having a short delay on start. The actual delay of the contents wouldn’t be a problem as there is no interaction from the user. Thus immediate response is unnecessary. The benefit of this would thus be a more static playing experience which is very important in music and anything that the radio plays. Any short-latency happening when the radio is requesting wouldn’t affect the actual playing of the radio as the radio wouldn’t be waiting on that data to play. (TC)
The problem of canary requests would be very simple to fix in our case. Since the data that transfers between our radio and the sever are relatively simple as in most cases it is only the request for a particular stream, but under the situation that such problem arose, there are simple solutions for our radio. The first idea would be to reverse to the most recent update as the main possible reason for such error to occur in the radio is faults in the actual programming rather than from other sources. In the case of other reasons causing such problems the solution is to temporarily drop any request that took too long, restart it later, and if the request causes the servers to be unresponsive too many times then completely drop it. It may also be beneficial to have a request handler that scan all request coming and drop any command that is suspicious, has caused crashes, or contains any errors. (TC)
Comments
Post a Comment