PureCLIP: incorporating input control data and CL-motifs

Input control signal

In case you did not already preprocess the input control data, you can get the intermediate BAM and BAI files as follows:

cp ../intermediate_results/input_control_data/Aligned.f.duplRm.R2.bam  ../input_control_data/
samtools index ./input_control_data/Aligned.f.duplRm.R2.bam

CL-motifs

In order to address the crosslinking sequence bias, PureCLIP can incorporate information about motifs that are known to be preferentially crosslinked, also called CL-motifs (see also Haberman et al., 2017). For each CL-motif PureCLIP learns the influence on the crosslinking probability. This can be particular useful for proteins binding to sequence motifs distinct from such CL-motifs, e.g. RBFOX2.

We need to know the positions of CL-motif occurrences (more details described in incorporateCLmotifs.html). Here we use a given BED file ~/protein-RNA-interactions/hg19_data/CLmotif_occurences.chr1_2_21.bed containing already computed occurrences of a set of four common CL-motifs together with a score.

PureCLIP

To run PureCLIP with input control data, additionally hand over the (preprocessed) BAM file from the input experiment with -ibam and the associated BAI file with -ibai. The computed CL-motif occurrences are then handed over with -fis together with the parameter -nim 4, indicating that scores with associated motif IDs 1-4 will be used (default: only scores with motif ID 1 are used).

pureclip -i Aligned.f.duplRm.pooled.R2.bam -bai Aligned.f.duplRm.pooled.R2.bam.bai -g ../hg19_data/Homo_sapiens.GRCh37.75.dna.primary_assembly.chr1_2_21.fa -iv 'chr21;' -bdw 20 -nt 8 -ibam ../input_control_data/Aligned.f.duplRm.R2.bam -ibai ../input_control_data/Aligned.f.duplRm.R2.bam.bai -nim 4 -fis ../hg19_data/CLmotif_occurences.chr1_2_21.bed -o PureCLIP_results/crosslinkSites.input_CLmotifs.bed -or PureCLIP_results/bindingRegions.input_CLmotifs.bed > PureCLIP_results/pureclip.input_CLmotifs.log