SNPs or single nucleotide polymorphisms are on many scientist’s wish list in experimental studies of genomic DNA sequences. Now with the availability of high throughput sequencing methods, also known as next generation sequencing (NGS), SNPs can be identified in the large amounts of DNA sequence that is generated.There is some research being done on SNPs in cell-free DNA (cf DNA).

When all is said and done, here are the main steps to generate high-quality SNPs data set: In this first step, you will determine the positions where at least one of your samples differs from the reference sequence or otherwise known as “variant calling”.

The next step, where you evaluate the individual alleles at all variant sites is known as “genotyping”.

This means that you have to get high quality NGS data, which requires careful library preparation (including DNA size selection) to obtain.

Sequencing coverage refers to the average number of times a single base is read during a sequencing experiment.

Calculate it like this: For example, a 10x coverage means each base has been read by 10 sequences, while a 100x coverage means each base has been read by 100 sequences.

The more frequently a base is sequenced, the higher the coverage of the reads, and the higher the reliability as well.

Most publications require the level of coverage ranges from 10x to 50x depending on the research application.

This type of filtering takes place in GATK Variant Filtration and VCFtools.

The second strategy uses soft filtering which generates reliable results with high coverage data sets.

On the other hand, if you are dealing with medium and low coverage sequencing data, you might want to use probabilistic or Bayesian methods to avoid undercalling of heterozygous genotypes.

