My first semester final is based on the work Francis Raycroft is doing. It is partially based on some of the topics I covered last year. Francis provided me with an excel worksheet covering the genes used and there accession numbers. The card holds over ~750 genes that will be analyzed using microarray data analysis on MATLAB.

ncbi code.png

To match accession number to UniGene Identifiers I attempted using some of the ncbi coding and tools. I used batch Entrez to go through and find UniGene Identifier numbers which took quite a bit of work. First i had to use the coding for just RefSeq because I only had accession numbers. Then I had to turn them into code and find their FASTA numbers which I could then use with the GenInfo integrated database code. After running that through batch Entrez, all of the searches would come up with their UniGene Identifiers. The only problem was that some accession numbers didn't have UniGene Identifiers due to the fact that they are predicted proteins. They are not completely identified yet so it is unknown whether or not it can have a specific identifier yet. Overall for this portion of the project, MATLAB was not able to be used. There is no matlab command that links with finding accession numbers and UniGene Identifiers. It touches too many parts within ncbi to be used.

ncbi codeee.png

The figure above is an example of what the coding for using Batch Entrez looked like. The accession number goes within the brackets of the RefSeq command and when uploaded to entrez, each number will come out with its fasta form which can then be ran again through entrez to reach the UniGene Identifiers.