A Unified Theory of Garbage Collection
Blog Post by: Kaitlyn Khan, Hamza Rashid, Tom Reingold, Tong Chen
Tracing garbage collection and reference counting have many parallels. Tracing starts at the roots and aims to find the live data. Reference counting starts from anti-roots, which are the objects whose reference counts are zero, and aims to find the dead data. In this sense reference counting is the complement of tracing. Tracing starts from a count of zero, and increments to the true count. Reference counting starts with an excessive count and decrements to the true count.(HR)
As mentioned above, tracing will initialize the count to zero. A work-list, which is the set of objects to be processed, is used in the algorithm. While iterating through the list, if a new vertex is encountered, all of it's outer edges will be added to the work-list. Once the loop is finished all the live nodes will be discovered. A separate function is used to return the unused nodes to free the storage.(HR)
Reference counting will start with an overestimation of the true count. Using a work-list, we iterate through the list. If a garbage vertex is discovered, then it's edges will be added to the work-list. A separate function is used to return the unused nodes to free the storage.(HR)
When comparing the two algorithms, the majority is the same. The difference is that tracing uses increments while reference uses decrements. To know if you will add a node to the work-list, you can check if the refence count is 1 for tracing or 0 for reference counting. By simply changing the condition, we can switch between tracing and reference counting.(HR)
Garbage collection is an interesting topic to discuss. Analyzing these methods inquiries on different ideas behind the algorithms being used. An algorithm is defined as a set of rules to follow in order to produce a specific output. This does not mean that all algorithms need to be successful in order to be useful. For instance, the authors of this piece were experimenting with tracing and reference counting to discover its best applications.Eventually the Mark-Sweep algorithm was introduced which both identifies the garbage in the heap and then gets rid of it. (KK)
The overall goal of a garbage collector is to get rid of unused memory to make more space. I was thinking about what heap procedure is similar to the one mark-sweep uses. To my understanding, the algorithm goes through the heap and marks which element needs to be removed, but doesn’t remove them until the end. We can reference this idea in Figure 13, where the “older-first algorithm” is modeled. In a way, it is trying to find the best case scenario to sort the heap and optimize running time. If we were to look at it the other way, and “sweep” the objects to one side of the array before removing them, it would increase garbage collection pause duration. This describes the idea of compaction, which is further explained in the object relocation section. During this simulation, a pointer would be needed, just like in many sorting algorithms in order to check every case is true before moving around the array. Algorithms are subjective depending on the array it is meant to be used for. There are some methods that optimize time, while others can disrupt configuration on the array. (KK)
The benefits of a garbage collector are simple. The first benefit is simply that the user doesn't have to manually deallocate the memories themselves. It also has the added benefit of reducing the potential for bugs like dangling pointer, double freeing, and memory leak. When the user has to manually deallocate memory the one thing that they can often forget to do is also dereference any pointers that are pointing to this piece of memory. The result is that there are still pointers that are trying to reference this memory. And if the memory is allocated for something else it might cause some unpredictable results. Another benefit is the avoidance of freeing the same memory twice. When manually deallocating memory it is likely that people will forget from time to time that they have already freed a piece of memory and will try to free it again. Freeing an empty piece of memory isn’t an issue, the problem is that piece of memory may have been assigned for another purpose. This might result in the user freeing memory that is critical and bugging out the code. The last is the problem of memory leaks. It might be that manual deallocation failed or more likely that the user forgot to free the memory. Anyways the amount of unused memory will stack up and eventually take up all of the memory spaces. (TC)
The garbage collector isn’t perfect. It also has some disadvantages. The first is that running a garbage collector takes up computing power as it finds and calculates which memory to free. It is worse if the garbage collector is scanning the whole memory when the user is only using a small part. Impacting the overall performance. Another problem is since there is no manual control the garbage collector might execute at unwanted moments. Causing stalls and interrupting the user. (TC)
It’s possible I didn’t understand the paper well enough, but it seems to me that it is saying that a hybrid approach is best. To me, this seems obvious, because a garbage collection method that takes too much CPU time is intolerable. Memory might be squeaky clean, but the program in question will run too slowly, and the system might also. A garbage collection that is fast but still leaves memory leaks (or lack of any garbage collection) will save CPU time but too much leaked memory eventually makes a system slow and unreliable. (TR)
I think this is analogous to paging algorithms for operating systems. The choice between least-recently-used and least-frequently-used is optimized by blending the two. Another analogy is physical clutter in my home or office. My tendency is to leave my most recent work on top of the furniture so I can resume as quickly as possible. But I still need to spend some time regularly tidying up my space to prevent it from being too cluttered. (TR)
Comments
Post a Comment