1.文件准备+运行方法

Tau (tissue-specific gene expression)[组织特异性得分]:计算一个基因不同组织中特异性表达的打分,介于0-1之间,越接近1越特异。

The tau index indicates how specific or broadly expressed a gene or transcript is, within studied tissues. Genes with a tau index close to 1 are more specifically expressed in one tissue, while genes with a tau index closer to 0 are equally expressed across all tissues studied[1].

tau 指数表明基因或转录本在研究组织中的特异性或广泛表达程度。tau 指数接近 1 的基因在某一组织中的表达更为特异,而 tau 指数接近 0 的基因在研究的所有组织中表达相同。

输入文件格式如下:

第一列是基因名,之后几列是表达量矩阵。

参数如下:

Options:

-h, --help Display this help message

-i, --in <file> Specify an input file

-o, --out <file> Specify an output file

-r, --replicates <int> number of replicates

-i 是输入文件的文件名

-o 是输出文件的文件名,默认是 “tau.txt”

-r 是该矩阵的生物学次数,默认参数是3被重复。如果是已经每组重复已经求过平均值了,那么可以直接使用1

比如输入文件时input.txt,输入文件是output.txt,4倍生物学重复,那应该是:

perl tau.pl -i input.txt -o output.txt -r 4

具体的运算方法是参考下面公式计算的[1]:

where N is the number of tissues being studied and xi is the expression profile component for a given tissue, normalised by the maximal component value for that gene (i.e. the expression of that gene in the tissue it is most highly expressed in)

其中 N 是研究组织的数量,xi 是特定组织的表达谱分量,以该基因的最大分量值(即该基因在其表达量最高的组织中的表达量)进行归一化处理

2.代码

#!/usr/bin/perl

#Tau.pl

#HYG

use strict;

use warnings;

use List::Util qw(sum);

use Getopt::Long;

 

my $help_requested;

my $file;

my $out="tau.txt";

my $r=3;

 

sub usage {

        print "Usage: $0 [options]\n";

        print "Options:\n";

        print "  -h, --help        Display this help message\n";

        print "  -i, --in <file> Specify an input file\n";

        print "  -o, --out <file> Specify an output file      default: tau.txt\n";

        print "  -r, --replicates <int> number of replicates    default: 3 \n";

}

if (@ARGV == 0) {

        usage();

        exit;

}

 

GetOptions(

        'h|help' => \$help_requested,

        'i|in=s' => \$file,

        'o|out=s' => \$out,

        'r|replicates=i' => \$r,

);

 

if ($help_requested) {

        usage();

        exit;

}

 

open FL, "$file" or die "cannot open the file $file\n";

open OUT, ">$out" or die "no output file name\n";

while(my $line = <FL>){

        chomp $line;

        my @array = split /\s+/,$line ;

        my $gene = shift @array;

        my $n = scalar @array;

        my $tau;

        unless ($n % $r == 0 ){

                die "not be  divisible\n";

        }

        unless ($n > $r){

                die "Total expression can not more than or equal the number of eplicates ";

        }

        my @tpm_values;

        my @groups;

        foreach my $element (@array){

                push @groups, $element;

                if (@groups == $r) {

                        my $average = sum(@groups) / $r;

 

                        push @tpm_values, $average;

                        @groups = ();

                }

        }

        my $max_expression = max(@tpm_values);

        next if($max_expression <= 0);

        for my $tpm (@tpm_values){

                print "$tpm\n";

                my $xi = $tpm / $max_expression;

                $tau += (1 - $xi);

        }

        $tau = $tau / (@tpm_values - 1);

        print OUT "$gene\t$tau\n";

}

close FL;

close OUT;

sub max {

    return (sort {$a <=> $b} @_)[-1];

}

参考文献

1.Palmer, D., Fabris, F., Doherty, A., Freitas, A.A. & de Magalhaes, J.P. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY) 13, 3313-3341 (2021).