Wednesday, June 22, 2022

C# Packing more reference dlls, but this time with compression!

Last week, I talked about packing a dotnet program's reference dlls into the exe using dnlib (post linked here). It works great, so i went ahead and tested this on a DevExpress winforms app:















As you can see, the packed application is pretty big, totaling in size at about 76mb. That's pretty big for just a demo app. I wanted to reduce the size of the packed executable as much as I could. So I thought to myself, why don't I just compress each dll with 7-zip, pack that data into the exe and inject code to decompress each dll on demand? Well I'm pleased to present an update to reference packer that now includes a -compress argument that makes this possible. Lets examine how this is done.


Outline

So in order to compress and pack our references, the initial steps are the same as it was previously, but with a few changes.

First, we have to choose the way we will be compressing the data to pack. I chose a library called SevenZipSharp to handle the compression for me. It leverages the 7z and LZMA sdk, which as we all know has superb compression.

After hours of fiddling with SevenSharp, I finally managed to figure out how to compress a file and get a raw stream back. I wanted to ensure that all the compression operations were done in memory only and nothing is ever saved to disk. That was actually kind of tedious, but once I read up on the docs for SevenSharp, its not that hard.












I decided to use this method to compress the data I need instead of using SevenZipCompressor.CompressBytes or SevenZipCompressor.CompressStream. The reasoning behind this is that both of those static methods directly use the C# implementation of LZMA. However, creating an instance of the SevenZipCompressor class and using its CompressStream method yields a better compression ratio than the CompressBytes static method. I suspect this is because of the fact that SezenZipSharp makes calls to the unmanaged 7z.dll library. I'm not sure why the unmanaged version gives a better compression than the C# LZMA version, but it sure does. It would reduce my test dll from 22mb, 11mb, while the C# LZMA version would reduce it from 22mb to 12mb. Not much of a difference but hey, every byte counts right?

Now we have a way to compress our references before we add them into the assembly as embedded resources. But we cant simply load those bytes into the AppDomain because they're compressed, not a valid .NET assembly anymore. We have to decompress the data before loading it into the running AppDomain. That leaves us with a problem, we are now have to inject code to leverage SevenSharp to decompress our references. 


Injecting the decompression code

This sounds like an easy enough task, but there's a few edge cases we have to account for. 

What if the assembly we are packing already leverages and/or references the SevenZipSharp library? And the most important question here, how are we going to decompress our references, if the SevenZipSharp dll is embedded as well? Lets tackle that last question, then I'll come back to the edge case. 

There was one obvious solution to the issue of how am I going to decompress all my 7zed references when I don't have the SevenZipSharp dll unpacked yet; don't compress the SevenZipSharp dll. EVER.

The dll in question is only 1.31mb, that's not too shabby of an overhead size. I can simply skip compression for this dll, and have my injected decompression code unpack the dll to disk. Why am I unpacking it to disk? Well since some referenced dlls don't get unpacked and loaded until later during the programs execution, I need the SevenZipSharp dll to be existing on disk and available to use for the entire time the program is running. But when the program closes, delete the file off the disk. 















This takes care of extracting the dll to disk. The method gets executed in the static <module>() method that the packer injects its code into. 

But what about deleting the file after the program shuts down? That's going to require a little bit of hackery, que the batch self destructer...





















This is a little hack I've found that can allow you to delete a file after a program has exited. It simply generates a batch file and saves it to the Temp folder, then calls it with Process.Start. This method can even be used to self delete the program exe itself, but we don't need that here. Instead, I am using this to delete the SevenSharp dll after the program exits. I added another event subscription in the startup code:

AppDomain.CurrentDomain.ProcessExit += CurrentDomain_ProcessExit;

This way, we can delete the unneeded sevenSharp dll upon exit, ensuring maximum portability.


The Edge Case

 Now we finally have to tackle that darn edge case, what happens when we try to pack an assembly that already references SevenZipSharp? Gotta account for that as well. 

What I did was while the packer is finding each reference dll that it's going to pack, it checks that dll to see if it is the SevenZipSharp dll. Now I could have gone the cheap route and just checked for that file name, SevenZipSharp.dll. But what if for some weird reason, the the target assembly references a SevenZipSharp.dll that ISNT the actual SevenZipSharp library dll. What I chose to do instead was to check the attributes of the reference in question and compare them to a few attributes of the real SevenSharp dll. I check the AssemblyTitle, Description, Company, Product, and Copyright attribute values, and match those strings with the ones I get from the reference in question. If all 5 of those attributes match, then I go ahead and say that is in fact the SevenSharp lib dll. 




























Easy peasy! Once we find that one of the references to be packed is our SevenSharp dll, we can flag that dll to A. not be compressed, and B. we don't have to inject that dll as a reference ourselves. 

Now if the target program doesn't reference SevenSharp, that's no problem, cause we can just embed the SevenSharp dll directly into the target assembly from our own embedded resources. I made a copy of the dll and embedded it into reference packer so that anytime we come across an assembly that doesn't have a reference to SevenSharp, we can have an easy way to embed it.





The Conclusion

After putting all of this together, we now have a much better system for embedding and packing references. Testing this on my DevExpress program, look how big of a difference it makes...





I added a cool looking ASCII banner, and colored some of the text to highlight things. The dll names in green means that those are being compressed, where as red means the opposite. 















We go from 76mb, to 31mb, more than half the original size!

This works great for bigger applications with either big reference dlls, or lots of reference dlls. However, small apps that are under 1.3mb wont be affected by this, but will actually increase in size since the SevenSharp dll has to be embedded as well, which generates the overhead. 

As usual, you can find the complete source to this project on my GitHub here.

Hope you all enjoyed this series, didn't expect this to turn into a 2 parter, but I'm glad it did. Check back here soon for more exciting material!

No comments:

Post a Comment

C# Packing more reference dlls, but this time with compression!

Last week, I talked about packing a dotnet program's reference dlls into the exe using dnlib (post linked here ). It works great, so i w...