Wednesday, June 22, 2022

C# Packing more reference dlls, but this time with compression!

Last week, I talked about packing a dotnet program's reference dlls into the exe using dnlib (post linked here). It works great, so i went ahead and tested this on a DevExpress winforms app:















As you can see, the packed application is pretty big, totaling in size at about 76mb. That's pretty big for just a demo app. I wanted to reduce the size of the packed executable as much as I could. So I thought to myself, why don't I just compress each dll with 7-zip, pack that data into the exe and inject code to decompress each dll on demand? Well I'm pleased to present an update to reference packer that now includes a -compress argument that makes this possible. Lets examine how this is done.


Outline

So in order to compress and pack our references, the initial steps are the same as it was previously, but with a few changes.

First, we have to choose the way we will be compressing the data to pack. I chose a library called SevenZipSharp to handle the compression for me. It leverages the 7z and LZMA sdk, which as we all know has superb compression.

After hours of fiddling with SevenSharp, I finally managed to figure out how to compress a file and get a raw stream back. I wanted to ensure that all the compression operations were done in memory only and nothing is ever saved to disk. That was actually kind of tedious, but once I read up on the docs for SevenSharp, its not that hard.












I decided to use this method to compress the data I need instead of using SevenZipCompressor.CompressBytes or SevenZipCompressor.CompressStream. The reasoning behind this is that both of those static methods directly use the C# implementation of LZMA. However, creating an instance of the SevenZipCompressor class and using its CompressStream method yields a better compression ratio than the CompressBytes static method. I suspect this is because of the fact that SezenZipSharp makes calls to the unmanaged 7z.dll library. I'm not sure why the unmanaged version gives a better compression than the C# LZMA version, but it sure does. It would reduce my test dll from 22mb, 11mb, while the C# LZMA version would reduce it from 22mb to 12mb. Not much of a difference but hey, every byte counts right?

Now we have a way to compress our references before we add them into the assembly as embedded resources. But we cant simply load those bytes into the AppDomain because they're compressed, not a valid .NET assembly anymore. We have to decompress the data before loading it into the running AppDomain. That leaves us with a problem, we are now have to inject code to leverage SevenSharp to decompress our references. 


Injecting the decompression code

This sounds like an easy enough task, but there's a few edge cases we have to account for. 

What if the assembly we are packing already leverages and/or references the SevenZipSharp library? And the most important question here, how are we going to decompress our references, if the SevenZipSharp dll is embedded as well? Lets tackle that last question, then I'll come back to the edge case. 

There was one obvious solution to the issue of how am I going to decompress all my 7zed references when I don't have the SevenZipSharp dll unpacked yet; don't compress the SevenZipSharp dll. EVER.

The dll in question is only 1.31mb, that's not too shabby of an overhead size. I can simply skip compression for this dll, and have my injected decompression code unpack the dll to disk. Why am I unpacking it to disk? Well since some referenced dlls don't get unpacked and loaded until later during the programs execution, I need the SevenZipSharp dll to be existing on disk and available to use for the entire time the program is running. But when the program closes, delete the file off the disk. 















This takes care of extracting the dll to disk. The method gets executed in the static <module>() method that the packer injects its code into. 

But what about deleting the file after the program shuts down? That's going to require a little bit of hackery, que the batch self destructer...





















This is a little hack I've found that can allow you to delete a file after a program has exited. It simply generates a batch file and saves it to the Temp folder, then calls it with Process.Start. This method can even be used to self delete the program exe itself, but we don't need that here. Instead, I am using this to delete the SevenSharp dll after the program exits. I added another event subscription in the startup code:

AppDomain.CurrentDomain.ProcessExit += CurrentDomain_ProcessExit;

This way, we can delete the unneeded sevenSharp dll upon exit, ensuring maximum portability.


The Edge Case

 Now we finally have to tackle that darn edge case, what happens when we try to pack an assembly that already references SevenZipSharp? Gotta account for that as well. 

What I did was while the packer is finding each reference dll that it's going to pack, it checks that dll to see if it is the SevenZipSharp dll. Now I could have gone the cheap route and just checked for that file name, SevenZipSharp.dll. But what if for some weird reason, the the target assembly references a SevenZipSharp.dll that ISNT the actual SevenZipSharp library dll. What I chose to do instead was to check the attributes of the reference in question and compare them to a few attributes of the real SevenSharp dll. I check the AssemblyTitle, Description, Company, Product, and Copyright attribute values, and match those strings with the ones I get from the reference in question. If all 5 of those attributes match, then I go ahead and say that is in fact the SevenSharp lib dll. 




























Easy peasy! Once we find that one of the references to be packed is our SevenSharp dll, we can flag that dll to A. not be compressed, and B. we don't have to inject that dll as a reference ourselves. 

Now if the target program doesn't reference SevenSharp, that's no problem, cause we can just embed the SevenSharp dll directly into the target assembly from our own embedded resources. I made a copy of the dll and embedded it into reference packer so that anytime we come across an assembly that doesn't have a reference to SevenSharp, we can have an easy way to embed it.





The Conclusion

After putting all of this together, we now have a much better system for embedding and packing references. Testing this on my DevExpress program, look how big of a difference it makes...





I added a cool looking ASCII banner, and colored some of the text to highlight things. The dll names in green means that those are being compressed, where as red means the opposite. 















We go from 76mb, to 31mb, more than half the original size!

This works great for bigger applications with either big reference dlls, or lots of reference dlls. However, small apps that are under 1.3mb wont be affected by this, but will actually increase in size since the SevenSharp dll has to be embedded as well, which generates the overhead. 

As usual, you can find the complete source to this project on my GitHub here.

Hope you all enjoyed this series, didn't expect this to turn into a 2 parter, but I'm glad it did. Check back here soon for more exciting material!

Monday, June 13, 2022

C# pack reference dlls using dnlib

So yesterday I was playing around with the trial copy of the DevExpress winforms controls, and built my first winform off of the library. It looks really good when using the dark skin:










Looks awesome, what doesn't look awesome is this:





















There is a bunch of reference dlls that DevExpress needs to run. A lot of .NET applications suffer from this issue, having a bunch of external dlls that need to be placed alongside the exe, that really sucks for portability; which is something I value highly when I'm building a simple app to do something. 

So I started to think, is there a way to get rid of these dlls so that I can just have the exe here by itself instead?

The answer is yes, you can. There is 2 ways of doing this, take each .NET dll and merge all its code into your exe. The second way of doing this is to take each dll, and embed it into our project exe. This is what this article will be covering, since I built a tool to do this to a .NET exe automatically. 

In this article, i will be using a sample application to test the reference packer with, it is a simple winforms app that references a dll that when called, displays a mesagebox stating its the referenced dll.








Premise:

This process will consist of two steps.

First, we have to find which references we want to embed into the target assembly. Once we have a list of paths to the reference assemblies, we then embed them as embedded resources into the exe.

Once the references are embedded, we can then perform the final step, injecting the code to resolve the packed references. Lets walk though these steps now. 

Finding the referenced assemblies:


getReferencedAndOtherDlls() is a method that will use dnlib to get a list of all the referenced assemblies from our target exe. It then will filter out mscorlib, and any other System.* references that arent present alongside the target. It will also return any other dlls that aren't present in the reference list pulled with dnlib. This allows us to pack dlls that aren't referenced directly by the target, such as unmanaged dlls or dlls that are referenced by one of the direct references from the target exe. 

getReferencedAndOtherDlls will recursively load each reference it finds and check to see if any other dlls or references that dll has are present alongside the target exe. 

Once the list of reference paths is returned, we create a new EmbeddedResource for each path in the list and read the files binary content into the embedded reference, then thats added to the dnlib ModuleDef instance that is our target exe. 



























Looking pretty good so far! We now have the MessageLib.dll reference packed as an embedded resource into the ExampleApp.exe. Now for the not so simple step: injecting the assembly resolver code.

The assembly resolver:
According to the docs for the AppDomain.CurrentDomain.AssemblyResolve event, its fired when "The resolution of an assembly fails." This is how we will load our packed assemblies into the local AppDomain. We simply subscribe to this event, and here is where the magic happens.





































This is the code that will load our embedded references into the app domain. 
First, we read the args.Name prop, and split it with a comma delimiter, and store the first part in a string var. This is because args.Name returns a full assembly name string, i.e "MessageLib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null"
So we have to get the first part, MessageLib and add a ".dll" to the end of the string. 
Next, we look that name up in our embedded resources, get the byte array of the dll, load that with Assembly.Load and then return the loaded Assembly instance for the resolver to use in our app domain. 

Seems simple enough right? Now we should be able to put a AppDomain.CurrentDomain.AssemblyResolve += CurrentDomain_AssemblyResolve; into the first line of the Program.Main and all is good right? Wrong.
We actually have to create a static module initializer, and then inject the subscription to  AssemblyResolve, and the CurrentDomain_AssemblyResolve method into the module initializer for this to work.
I'm not sure if its called a module initializer, or an assembly initializer, but what I'm referring to is the static <Module>() method in the <Module> class that's emitted by the compiler. This is referred to by dnlib as the "GlobalType"











This way, we ensure that we are subscribed to the AssemblyResolve event as soon as the CLR starts executing our program. 
How in the world are we going to inject this into the module initializer, when you cant access that from C#? Time to call up our dear old friend, dnlib again.

Injecting the assembly resolver:
After doing some research, I determined that the best course of action for this would be to put the event subscription, and the CurrentDomain_AssemblyResolve  method into its own class in the packer exe.
I came across this stackoverflow post, which led me to this class in a fork of ConfuserEx.
It allows one to clone an entire class, or what dnlib calls a TypeDef, from one Module to another. Methods, fields, everything is cloned. 
This makes life VERY easy for me, because I already have a class called ModuleInit that I put the assembly resolver in. So now I can use this class to inject, or copy over the compiled ModuleInit class into our target exe. 





















First, we find or create the static constructor, the .cctor or static <Module>()
This is the method we will be putting our event subscription to AppDomain.CurrentDomain.AssemblyResolve in.
Next, we find the "ModuleInit" class that is compiled inside the reference packer exe. Once we have the TypeDef instancer for that class, we can copy it over to the target exe by calling InjectHelper.Inject with our ModuleInit typedef instance, and the module we want to inject it into. 
































From here, the code is pretty straight forward. Move all the methods inside the injected ModuleInit typedef into the <Module> typedef. Then remove the now empty ModuleInit type.
Next we find the "Run" method that we just copied to <Module> and close its instructions into the <Module>.cctor ( static <Module>() )

After that, delete the "Run" method out of <Module> and we are all set to write all the changes back to the target exe.






































Voila! We now have an executable that depends on MessageLib.dll, but does not have to have that dll sitting next to the exe to run. It will load the dll from memory when its needed.

This technique can make it harder for a reverse engineer (such as myself) to debug an application that depends on some referenced dll. Dnspy cant load the dll as a reference because it dosent exist on disk. This will break the debugging process if you tried to see where the call to the MessageLib.dll goes in the main executables code. This technique can also be the basis of an actual packer, because you can inject any prebuilt code you want; such as an entire assembly loader/decrypter. 
This also makes your exe much more portable as well, not having to have a bunch of other dlls sitting next to it to run. This kind of thing would be great for installers and the like. 

Thanks for following along with this! You can find the complete source to this project here:

Stay tune for more projects like this coming here soon!

Welcome Post

Welcome!

This blog will detail my ventures in C# programming, and reverse engineering.

My primary library I use for my reversing projects is dnlib, and I use visual studio 2022, 2019, 2017.

Hope you all enjoy the content to come!

C# Packing more reference dlls, but this time with compression!

Last week, I talked about packing a dotnet program's reference dlls into the exe using dnlib (post linked here ). It works great, so i w...