PureCLIP: basic mode¶

In case something went wrong during the preprocessing, you can obtain the intermediate files as follows:

cp ~/protein-RNA-interactions/intermediate_results/RBFOX2_data/Aligned.f.duplRm.pooled.R2.bam .

PureCLIP¶

To run PureCLIP in its basic mode, i.e. without incorporating external data as covariates, it requires BAM and BAI files, the reference genome and specified output files:

mkdir PureCLIP_results
pureclip -i Aligned.f.duplRm.pooled.R2.bam -bai Aligned.f.duplRm.pooled.R2.bam.bai -g ~/protein-RNA-interactions/hg19_data/Homo_sapiens.GRCh37.75.dna.primary_assembly.chr1_2_21.fa -iv 'chr21;' -bdw 20 -nt 8 -o PureCLIP_results/crosslinkSites.basic.bed -or PureCLIP_results/bindingRegions.basic.bed > PureCLIP_results/pureclip.basic.log

With -iv the chromosomes (or transcripts) can be specified that will be used to learn the parameters of PureCLIPs HMM. This reduces the memory consumption and runtime. Usually, learning on a small subset of the chromosomes, e.g. Chr1-3, does not impair the results noticeable. However, in the case of very sparse data this can be adjusted. With -nt the number threads for parallelization can be specified. The parameter -bdw is the bandwidth used for smoothing to read start counts (default: -bdw 50).