3. De novo assembly

3.1 metaSPades

We will perform de novo assembly of the host-filtered reads to create contigs (longer, assembled sequences) with SPades.

spades.py -t {threads} \
--meta \
-o {output} \
-1 {input.R1} \
-2 {input.R2} \
-s {input.S}

If you do not want to include singleton reads for the assembly, then simply remove the -s {input.S} argument from the command.

  • {input.R1}, {input.R2} and {input.S} are host-filtered FASTQ files (R1, R2, and S) from step 2.3.
  • {output} defines the directory where the assembly results will be stored.

3.2 Renaming contigs

We will add sample names to the beginning of each contig name. This will make sure that each sample’s contig names are unique before we start aggregating contigs. If you are only dealing with a single sample, then this step can be seen as optional.

seqkit replace -p "^" -r "{sample}_" {input} > {output}
  • {sample} is the sample name that will be placed in front of the contig name
  • {input} is your contig file from step 3.1
  • {output} is fasta file with the renamed contig

3.3 Aggregating contigs

Next we will combine all renamed contigs into a single fasta file so we can perform taxonomic annotation across all samples in one go. Once again, this step can be seen as optional if you have a single sample.

cat {input} > {output}
  • {input} are your renamed contig files from step 3.2
  • {output} a single .fasta file
Note

We have created contigs that are ready for taxonomic annotation in the next chapter.