Reputation: 9558
I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.
I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).
I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.
Maybe it possible decompress TAR with bare C#?
Upvotes: 19
Views: 43469
Reputation: 4977
Because of the updates from dotnet 7.0, we can do it all fairly simply with standard dotnet libraries. Here is the solution:
public async Task UnzipToDirectory(
Stream compressedSource,
string destinationDirectory,
CancellationToken cancellationToken = default
)
{
if (!Directory.Exists(destinationDirectory))
Directory.CreateDirectory(destinationDirectory);
await using MemoryStream memoryStream = new();
await using (GZipStream gzipStream =
new(compressedSource, CompressionMode.Decompress))
{
await gzipStream.CopyToAsync(memoryStream, cancellationToken);
}
memoryStream.Seek(0, SeekOrigin.Begin);
await TarFile.ExtractToDirectoryAsync(
memoryStream,
destinationDirectory,
overwriteFiles: true,
cancellationToken: cancellationToken
);
}
Upvotes: 1
Reputation: 11
Based off ForeverZer0's answer. Fixed some issues. It uses significantly less memory by avoiding stream copies, and handles larger archives and longer filenames (prefix tag). This still doesnt handle 100% of the USTAR tar specification.
public static void ExtractTarGz(string filename, string outputDir)
{
void ReadExactly(Stream stream, byte[] buffer, int count)
{
var total = 0;
while (true)
{
int n = stream.Read(buffer, total, count - total);
total += n;
if (total == count)
return;
}
}
void SeekExactly(Stream stream, byte[] buffer, int count)
{
ReadExactly(stream, buffer, count);
}
using (var fs = File.OpenRead(filename))
{
using (var stream = new GZipStream(fs, CompressionMode.Decompress))
{
var buffer = new byte[1024];
while (true)
{
ReadExactly(stream, buffer, 100);
var name = Encoding.ASCII.GetString(buffer, 0, 100).Split('\0')[0];
if (String.IsNullOrWhiteSpace(name))
break;
SeekExactly(stream, buffer, 24);
ReadExactly(stream, buffer, 12);
var sizeString = Encoding.ASCII.GetString(buffer, 0, 12).Split('\0')[0];
var size = Convert.ToInt64(sizeString, 8);
SeekExactly(stream, buffer, 209);
ReadExactly(stream, buffer, 155);
var prefix = Encoding.ASCII.GetString(buffer, 0, 155).Split('\0')[0];
if (!String.IsNullOrWhiteSpace(prefix))
{
name = prefix + name;
}
SeekExactly(stream, buffer, 12);
var output = Path.GetFullPath(Path.Combine(outputDir, name));
if (!Directory.Exists(Path.GetDirectoryName(output)))
{
Directory.CreateDirectory(Path.GetDirectoryName(output));
}
using (var outfs = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
{
var total = 0;
var next = 0;
while (true)
{
next = Math.Min(buffer.Length, (int)size - total);
ReadExactly(stream, buffer, next);
outfs.Write(buffer, 0, next);
total += next;
if (total == size)
break;
}
}
var offset = 512 - ((int)size % 512);
if (offset == 512)
offset = 0;
SeekExactly(stream, buffer, offset);
}
}
}
}
Upvotes: 0
Reputation: 4545
.NET 7 added several classes to work with TAR files:
Extract to a directory:
await TarFile.ExtractToDirectoryAsync(tarFilePath, outputDir);
Enumerate a TAR file and manually extract its entries:
await using var tarStream = new FileStream(tarFilePath, new FileStreamOptions { Mode = FileMode.Open, Access = FileAccess.Read, Options = FileOptions.Asynchronous });
await using var tarReader = new TarReader(tarStream);
TarEntry entry;
while ((entry = await tarReader.GetNextEntryAsync()) != null)
{
if (entry.EntryType is TarEntryType.SymbolicLink or TarEntryType.HardLink or TarEntryType.GlobalExtendedAttributes)
{
continue;
}
Console.WriteLine($"Extracting {entry.Name}");
await entry.ExtractToFileAsync(Path.Join(outputDirectory, entry.Name));
}
Upvotes: 13
Reputation: 1620
Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.
using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;
private static String directoryPath = @"C:\Temp";
public static void unTAR(String tarFilePath)
{
using (Stream stream = File.OpenRead(tarFilePath))
{
var reader = ReaderFactory.Open(stream);
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory)
{
ExtractionOptions opt = new ExtractionOptions {
ExtractFullPath = true,
Overwrite = true
};
reader.WriteEntryToDirectory(directoryPath, opt);
}
}
}
}
Upvotes: 9
Reputation: 2496
While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz
file to disk.
While the gz
format could be considered rather complicated, tar
on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz
archive, which .NET conveniently has built-in, which takes care of all the hard part.
Having looked at the spec for the tar
format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name
, and the second is size
. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.
I made a very rudimentary, down-and-dirty method to extract a tar
archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz
file first using built-in functions.
The primary method is this:
public static void ExtractTar(Stream stream, string outputDir)
{
var buffer = new byte[100];
while (true)
{
stream.Read(buffer, 0, 100);
var name = Encoding.ASCII.GetString(buffer).Trim('\0');
if (String.IsNullOrWhiteSpace(name))
break;
stream.Seek(24, SeekOrigin.Current);
stream.Read(buffer, 0, 12);
var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);
stream.Seek(376L, SeekOrigin.Current);
var output = Path.Combine(outputDir, name);
if (!Directory.Exists(Path.GetDirectoryName(output)))
Directory.CreateDirectory(Path.GetDirectoryName(output));
using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
{
var buf = new byte[size];
stream.Read(buf, 0, buf.Length);
str.Write(buf, 0, buf.Length);
}
var pos = stream.Position;
var offset = 512 - (pos % 512);
if (offset == 512)
offset = 0;
stream.Seek(offset, SeekOrigin.Current);
}
}
And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz
file/stream before extracting.
public static void ExtractTarGz(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTarGz(stream, outputDir);
}
public static void ExtractTarGz(Stream stream, string outputDir)
{
// A GZipStream is not seekable, so copy it first to a MemoryStream
using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
{
const int chunk = 4096;
using (var memStr = new MemoryStream())
{
int read;
var buffer = new byte[chunk];
do
{
read = gzip.Read(buffer, 0, chunk);
memStr.Write(buffer, 0, read);
} while (read == chunk);
memStr.Seek(0, SeekOrigin.Begin);
ExtractTar(memStr, outputDir);
}
}
}
public static void ExtractTar(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTar(stream, outputDir);
}
Here is a gist of the full file with some comments.
Upvotes: 18
Reputation: 726987
Since you are not allowed to use outside libraries, you are not restricted to a specific format of the tar
file either. In fact, they don't even need it to be all in the same file.
You can write your own tar-like utility in C# that walks a directory tree, and produces two files: a "header" file that consists of a serialized dictionary mapping System.IO.Path
instances to an offset/length pairs, and a big file containing the content of individual files concatenated into one giant blob. This is not a trivial task, but it's not overly complicated either.
Upvotes: 3
Reputation: 250
there are 2 ways to compress/decompress in .NET first you can use Gzipstream class and DeflatStream both can actually do compress your files in .gz format so if you compressed any file in Gzipstream it can be opened with any popular compression applications such as winzip/ winrar, 7zip but you can't open compressed file with DeflatStream. these two classes are from .NET 2.
and there is another way which is Package class it's actually same as Gzipstream and DeflatStream the only different is you can compress multiple files which then can be opened with winzip/ winrar, 7zip.so that's all .NET has. but it's not even generic .zip file, it something Microsoft uses to compress their *x extension office files. if you decompress any docx file with package class you can see everything stored in it. so don't use .NET libraries for compressing or even decompressing cause you can't even make a generic compress file or even decompress a generic zip file. you have to consider for a third party library such as http://www.icsharpcode.net/OpenSource/SharpZipLib/
or implement everything from the ground floor.
Upvotes: -1