jeffandcyrus
jeffandcyrus

Reputation: 3

Unable to extract multiple .fastq.gz files on windows

I have a bunch of fastq.gz files in a directory that I need to unzip and concatenate. I have tried doing this in multiple ways and it doesn't seem to work. I am working on a Windows machine, and since it's a work computer I can't download anything like 7-zip.

In Windows command prompt, I've tried this:

-xvzf C:\Path\To\File\filename.fastq.gz 

which gives the following error: tar: Error opening archive: Unrecognized archive format

Ideally, I'd like to do this with a wildcard, so it extracts all zipped files in the directory, but this also gives an error:

-xvzf C:\Path\To\File\*.fastq.gz 

Error: tar: Error opening archive: Failed to open C:\Path\To\File*.fastq.gz.

I've also tried using Expand-Archive in Windows Powershell, but I run into similar issues. I'm sure this is a relatively simple problem, but I would appreciate any advice or explanation for what's happening here, as I'm relatively new to working in the command line.

Solution: Thanks to u/FiddlingAway for the comment below pointing me to additional resources, but to be completely honest, I'm not well-versed enough in PowerShell to make sense of the advice given there - I couldn't figure out how to adapt it to my needs. Instead, I realized that I could use simpler code in git bash. For anyone else that needs the solution spelled out in simple terms, here is what I did:

  1. Use gunzip to extract all gzipped files in the directory by giving the full file path and a wildcard to all gzipped files in that directory.
gunzip C:/Path/To/Files/*.gz
  1. After using cd to change to the directory where the unzipped files are located, use cat to merge all files into one merged file.
cat *.fastq > merged_reads.fastq
  1. Use awk to convert from fastq to fasta.
awk 'NR%4 ==1 {print ">" substr($0,2)} NR%4 == 2 {print}' merged_reads.fastq > merged_reads.fasta

Upvotes: 0

Views: 743

Answers (0)

Related Questions