April+2013

As we begin the last quarter of the year, I am hoping to begin a new project in the field of bioinformatics.

April 29th, 2013
My new project is going to be working with an exercise from Microbial Life which leaves an exercise with instructions about an introduction to NCBI (National Center for Biotechnology Information). I have used ncbi for awhile now, since midway through my junior year so the website is not new to me. The amount of information, sources and programs available on the ncbi website is revolutionary. It allows just about anyone to be able to access the bioinformatic world and use readily available tools.

My two beginning goals are as follows: 1. "To show the ways in which the NCBI online database classifies and organizes information on DNA sequences, evolutionary relationships, and scientific publications." 2. "To identify an unknown nucleotide sequence from an insect endosymbiont by using the NCBI search tool BLAST."

The exercise is going to mainly deal with the GeneBank portion of NCBI which deals with different nucleotide sequences. 

This is a screenshot of the part of ncbi i will be working with for my first goal. It is the taxonomy part of ncbi and this part is the statistic part of the taxonomy page.

Exercise 1
The first question of the goal is as follows: For the year 2005, how many new Bacterial species were added to the sequence database? - the answer is 720 new species of Bacterial were added in 2005 and if you click on the number under the column species it will tell you specifically how many were added each month during 2005. To find out how many were added during a certain year. You simply put the interval into the from and to boxes.

For the year 1999, how many new Bacterial species were added to the sequence database? Wow, what a difference six years makes!

- In 1999 only 490 species were added throughout the year. In 6 years the amount almost doubled of species added per year.

My next question is answer is to research the taxonomy home page and look at a certain extinct insect from the times of the Mya from the year 120. It is called **//Libanorhinus succinus.// **Once discovered, clicking the name of the insect will take you to a page that describes more information about the insect and also gives its full genome.

I am then asked the following question: What are some other organisms that belong to this phylum of animals? Can you think of any body traits that these organisms have in common?  - the similar organisms are spiders, scorpions, crabs and many legged-insects. The similar body trait is the amount and length of the animals legs along with the shape of their body. Most of these similar animals have longer, and multiple legs with smaller proportioned bodies.

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 1.5;"> How many nucleotide sequences have been deposited into the Entrez Records for this organism? <span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 1.5;"> - there is exactly 1 nucleotide sequence for this organism

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 1.5;">What is the name of the gene that was sequenced for this organism? <span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 1.5;"> - Lebanorhinus succinus 18S ribosomal RNA gene is the name for the organism which is listed at the top of the GenBank page for this organism

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 1.5;">How many nucleotide base pairs does this DNA entry contain? <span style="color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif;"> - there are 315 bp which can be located on the first line of the web page

<span style="color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif;">And with this is, it is the end of the first exercise.

<span style="color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif;">Exercise 2
My first goal is to create a random nucleotide sequence and run it through blast to see if there are any hits at all with my search. my code was composed of 71 different combinations of nucleotides. When I searched it, there were no hits at all.

I then blasted the following code for Wolbachia:

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">GTTGCAGCAATGGTAGACTCAACGGTAGCAATAACTGCAGGACCTAGAGGAAAAACAGTAGGGATT AATAAGCCCTATGGAGCACCAGAAATTACAAAAGATGGTTATAAGGTGATGAAGGGTATCAAGCCT GAAAAACCATTAAACGCTGCGATAGCAAGCATCTTTGCACAGAGTTGTTCTCAATGTAACGATAAA GTTGGTGATGGTACAACAACGTGCTCAATACTAACTAGCAACATGATAATGGAAGCTTCAAAATCA ATTGCTGCTGGAAACGATCGTGTTGGTATTAAAAACGGAATACAGAAGGCAAAAGATGTAATATTA AAGGAAATTGCGTCAATGTCTCGTACAATTTCTCTAGAGAAAATAGACGAAGTGGCACAAGTTGCA ATAATCTCTGCAAATGGTGATAAGGATATAGGTAACAGTATCGCTGATTCCGTGAAAAAAGTTGGA AAAGAGGGTGTAATAACTGTTGAAGAGAGTAAAGGTTCAAAAGAGTTAGAAGTTGAGCTGACTACT GGCATGCAATTTGATCGCGGTTATCTCTCTCCGTATTTTATTACAAATAATGAAAAAATGATCGTG GAGCTTGATAATCCTTATCTATTAATTACAGAGAAAAAATTAAATATTATTCAACCTTTACTTCCT ATTCTTGAAGCTATTGTTAAATCTGGTAAACCTTTGGTTATTATTGCAGAGGATATCGAAGGTGAA GCATTAAGCACTTTAGTTATCAATAAATTGCGTGGTGGTTTAAAAGTTGCTGCAGTAAAAGCTCCA GGTTTTGGTGACAGAAGAAAGGAGATGCTCGAAGACATAGCAACTTTAACTGGTGCTAAGTACGTC <span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">ATAAAAGATGAACTT

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">Figure 1 shows the first thing that shows after running the BLAST. It displays the number of bp shown and the score alignment with the gene they are matching the sequence up with.

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">This code gave me a 100% match with an E-Value of 0.0 the most likely name for this sequence is

<span style="background-color: #ffffff; color: #222222; font-family: arial,sans-serif; font-size: 15px;">Wolbachia endosymbiont of Nasonia longicornis GroEL (groEL) gene, partial cds.

<span style="background-color: #ffffff; color: #222222; font-family: arial,sans-serif; font-size: 15px;">The next part was to take only 135 bp and run a blast on that and compare values:

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">GTTGCAGCAATGGTAGACTCAACGGTAGCAATAACTGCAGGACCTAGAGGAAAAACAGTAGGGATT AATAAGCCCTATGGAGCACCAGAAATTACAAAAGATGGTTATAAGGTGATGAAGGGTATCAAGCCTGAA

<span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px;">The E-value when compared to the entire nucleotide sequence has an E-Value very close to 0 but it is not. <span style="background-color: #ffffff; color: #222222; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 14px; line-height: 0px; overflow: hidden;">

The search is still very accurate only using 135 bp, the same gene is detected as the top match.The accuracy of each search shows great confidence, even with only 135 bp I was able to get an exact match to when I BLASTed the entire sequence.