Post date: Sep 28, 2016 12:48:58 PM
The following script looks in each gene for the longest space (tndif) between a set of adjacent transposons insertion sites, looks if this space is bigger than a given amount of bp (mingap), that it is less than a fraction of the gene (maxlength), larger than a fraction of the gene (minlength), and that the gene in question bears a meaningful number of transposon insertion sites (mintnnumber). Then it calculates a score for each gene
score for each gene=(size of longest space)*(number of transposon mapping to that gene)/(length of the gene)
Highest scoring genes have non-essential protein domains
clear essentialdomain
for ii=1:length(genes.coordinates)
tnindex=find(tncoordinates_copy(:,2)>genes.coordinates(ii,1) & tncoordinates_copy(:,2)<genes.coordinates(ii,2));
tns(ii).data=[genes.coordinates(ii,1); tncoordinates_copy(tnindex,2); genes.coordinates(ii,2)];
tnnumber(ii)=length(tns(ii).data);
genelength(ii)=genes.coordinates(ii,2)-genes.coordinates(ii,1);
end
mingap=200
maxlength=0.9
minlength=0.10
mintnnumber=20
skiptn=5 %minimum is 1 !
for ii=1:length(genes.coordinates)
if tnnumber(ii)<skiptn+1
tndif(ii)=0;
else
tndif(ii)=max(tns(ii).data(skiptn+1:length(tns(ii).data))-tns(ii).data(1:length(tns(ii).data)-skiptn));
end
end
tndif=double(tndif);
essentialdomain=double(tndif).*tnnumber./genelength.^1.5;
essentialdomain(tndif<mingap | tndif./genelength>maxlength | tndif./genelength<minlength | tnnumber<mintnnumber)=0;
[dump index]=sort(essentialdomain,'descend');
clear dump
plot(essentialdomain(index),'.')