当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing
GigaScience ( IF 9.2 ) Pub Date : 2022-12-06 , DOI: 10.1093/gigascience/giac099
Carlos Farkas 1 , Antonia Recabal 2 , Andy Mella 3, 4 , Daniel Candia-Herrera 5 , Maryori González Olivero 2 , Jody Jonathan Haigh 6, 7 , Estefanía Tarifeño-Saldivia 5 , Teresa Caprile 2
Affiliation  

Background The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. Results We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. Conclusions Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes

中文翻译:

annotate_my_genomes:一种易于使用的管道,用于改进基因组注释并通过混合 RNA 测序发现被忽视的基因

背景 混合测序技术的进步正在日益扩展通常使用混合测序转录组学进行注释的基因组组装,从而改善基因组表征并鉴定各种生物体中的新基因和亚型。结果 我们开发了一个易于使用的基因组引导的转录组注释管道,该管道使用来自混合测序数据的组装转录本作为输入,并通过集成多种生物信息学方法(包括与 GTF 中先前注释的基因协调)来区分编码和长链非编码 RNA格式。我们通过正确组装和注释来自鸡 SCO-spondin 基因(包含超过 105 个外显子)的所有外显子来证明这种方法的效率,包括通过同源分配识别鸡参考注释中缺失的基因。结论 我们的方法有助于改进目前鸡脑的转录组注释。我们在 Anaconda/Nextflow 和 Docker 上实施的管道是一个易于使用的软件包,可应用于广泛的物种、组织和研究领域,有助于改进和协调当前的注释。代码和数据集可在 https://github.com/cfarkas/annotate_my_genomes 上公开获得 和研究领域有助于改进和协调当前的注释。代码和数据集可在 https://github.com/cfarkas/annotate_my_genomes 上公开获得 和研究领域有助于改进和协调当前的注释。代码和数据集可在 https://github.com/cfarkas/annotate_my_genomes 上公开获得
更新日期:2022-12-06
down
wechat
bug