Graduate Project

Identifying DNA sequence motifs of Pdx-1 and NeuroD1 transcription factors

Transcription is a biochemical process in which genes are copied to produce proteins. Transcription is initiated by the binding of special proteins called transcription factors, at specific sites in the promoters of genes. Transcription factors binding sites are short patterns of consecutive characters, hereafter called motifs. This paper discusses the design, implementation, and application of a word-based approach to detect motif pairs in a set of co-regulated promoters of a reference genome. More specifically, the approach forms motif pairs using distinct strings of lengths ranging from 6 to 8 characters from the co-regulated prompters. Then it uses the hypergeometric probabilistic model to measure the p-value of each motif pair, which indicates the motif pair’s statistical significance. Furthermore, the method clusters the statistically significant motif pairs using the Tanimoto distance to eliminate possible duplicates. Moreover, it uses a phylogenetic conservation analysis, which examines the statistical significance of the motif pairs in several different genomes. Lastly, it uses randomized analysis to control the false discovery rate (or to limit the number of motif pairs that are found to be significant at random). To demonstrate the biological relevance of the results, the approach measures the information content, which shows the conservation level of the nucleic-characters at each position in a motif pair. Finally, the method investigates the positional bias of the resulting motifs relative to the transcription start sites. We have applied the approach discussed in this paper to detect motif pairs of pdx-1 and Nureod1 transcription factors, which regulate the production of insulin. We have evaluated a total of 4465 motif pairs, which were formed using distinct strings obtained from the set of co-regulated promoters from the mouse gnome. As a result, we have detected 178 motif pairs that are statistically significant and conserved in the rat or human genomes.

Chico State is committed to accessibility. If you have any problems accessing this material, please contact the Accessibility Resource Center at (530) 898-5959 or submit an Accessible Content service ticket.

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.