![]() Similarly, a range of biological investigations can be reduced to similar statistical analyses. By introducing a generic methodology to genome analysis, we find that a range of genomic data sets can be represented by the same mathematical objects, and that a small set of such objects suffice to describe the bulk of current data sets. The abstractions of genomic data, the proposing of prototype investigations, and the careful attention given to null models simplifies statistical inference for a range of possible research topics.īut as the three-dimensional organization of the genome will become increasingly known, appropriate volume topologies will be possible, so that neighboring volumes representing three-dimensional contiguity may be used as a basis for statistical tests. We provide tools for abstraction and cataloguing of what we believe are typical questions of broad interest. Note also that our system is tightly connected to Galaxy and can make use of all the tools provided within Galaxy. ![]() Galaxy provides a simple user interface, is rich in tools for manipulating and analyzing datasets of diverse formats, but has little support for formal statistical testing. EpiGRAPH is focused on a specific type of scenario that, according to our cataloguing, amounts to the comparison of unmarked points or segments versus categorically marked segments (with mark being case or control). We rely on a dynamic system to infer appropriate options, aiding maintenance.Ĭomparing this to the EpiGRAPH and Galaxy frameworks, which we believe are the closest existing systems, we find that both require substantial technical expertise when choosing the correct analysis and options. This guided approach hides unnecessary complexities from the researcher, while confronting her with important design choices as needed. ![]() In order to simplify the task of making choices, a step-wise approach has been implemented, displaying only the relevant options at each stage. However, the complex interdependencies between the large body of available tracks, a number of syntactically different analyses, and a range of choices for constructing null models, all pose challenges to the concepts of simplicity and ease of use. The system provides a web-based user interface with a low entry point. A further speedup is achieved by memoizing intermediate results to disk, automatically retrieving them when needed for the same or different analyses on the same track(s) at any subsequent time, by any user. To reduce the memory footprint of analyses on genome-wide data, an iterative divide-and-conquer algorithm is automatically carried out when applicable. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |