Basic mode

Generate sample files for minimal example: Show/Hide

As a first example you can download preprocessed data from ENCODE, and filter the paired-end data to keep only R2:

wget -O aligned.prepro.bam https://www.encodeproject.org/files/ENCFF280ONP/@@download/ENCFF280ONP.bam
samtools view -hb -f 130 aligned.prepro.bam -o aligned.prepro.R2.bam
samtools index aligned.prepro.R2.bam

Additionally, we need the corresponding reference genome:

wget -O ref.hg19.fa.gz https://www.encodeproject.org/files/female.hg19/@@download/female.hg19.fasta.gz
gunzip ref.hg19.fa.gz

PureCLIP

To run PureCLIP in basic mode, it requires BAM and BAI files, the reference genome and a specified output file:

pureclip -i aligned.prepro.R2.bam -bai aligned.prepro.R2.bam.bai -g ref.hg19.fa -iv 'chr1;chr2;chr3;' -nt 10 -o PureCLIP.crosslink_sites.bed

With -iv the chromosomes (or transcripts) can be specified that will be used to learn the parameters of PureCLIPs HMM. This reduces the memory consumption and runtime. Usually, learning on a small subset of the chromosomes, e.g. Chr1-3, does not impair the results noticeable. However, in the case of very sparse data this can be adjusted. With -nt the number threads for parallelization can be specified.