MIPE
Minimal Information for PCR Experiments
An XML schema for the exchange of PCR related data

Jan Aerts
news purpose download rationale implementation usage scripts wishlist contact

News

July 20, 2005

A manuscript describing the MIPE format is now available from the Online Journal of Bioinformatics website. Reference: Aerts J and Veenendaal T. MIPE - a XML-format to facilitate the storage and exchange of PCR-related data. OJB 6(2): 106-112 (2005).

In addition, Steffen Moeller has created a debian package for MIPE, which is available at http://bioinformatics.pzr.uni-rostock.de/~moeller/debian/mipe. To install, run 'dpkb --install mipe_1.1-1_all.deb'.

May 3, 2005

A manuscript describing the MIPE format has been submitted.

April 5, 2005

The format has now reached version 1.0, and is defined in a XMLSchema file instead of a DTD file. The scripts have undergone minor changes as well. Take CAUTION: I did not have the time yet to test these properly! In addition, the comments on this website still reflect older versions of MIPE. When time permits, I'll change those.

Purpose

To provide a standard format (i.e. MIPE) to exchange and/or storage of all information associated with PCR experiments using a flat text file. This will: Although this tool can be used for data storage, it's primary focus should be data exchange. For larger reporisitories, relational databases are more appropriate for storage of these data. The MIPE format could then be used as a standard format to import into and/or export from these databases. (See for an example of using text-files for data exchange: Lincoln Stein's article How perl saved the Human Genome Project.

If I have the time, I'll post a SQL scheme for a relational database on this site to store PCR related data. In addition, a small script will be written to import/export data into/from this database implementation.

If the MIPE format almost-but-not-completely serves your needs...

As this is an open format, please don't hesitate to contact me if the MIPE-format almost-but-not-completely serves your needs. It has been developed to serve ours, but can easily be extended (which is what I hope to do with your help).

Download

Download from the Sourceforge website or checkout from cvs for the latest sources using
cvs -d:pserver:anonymous@cvs.sf.net:/cvsroot/mipe export -dtomorrow all
Developers can checkout using:
cvs -d:ext:username@cvs.sf.net:/cvsroot/mipe checkout all

A Debian package is available on http://bioinformatics.pzr.uni-rostock.de/~moeller/debian/mipe.

Rationale

The MIPE format is built on two basic parts: A schematic overview of the relationship between these three is provided below.

Design

The design part of a MIPE record contains information on the source that was used to design the PCR primers (e.g. an accession number or DNA sequence) and information on the PCR primers.

Use

The use part of a MIPE record contains information on the results from a PCR resequencing experiment. These include the DNA sequence of the amplified fragment, whether or not this is the reverse complement of the DNA sequence as presented in the design part, any polymorphisms with associated assays and samples with associated genotypes.

Implementation

XML

XML stands for eXtented Markup Language. It looks much like HTML (HyperText Markup Language), which uses tags to markup text for webpages. An example of HTML text is:
This <italic>word</italic> is in italic.
It is displayed on webpages as:
This word is in italic.
This example shows that HTML stores information on how words should be presented to the end-user.

XML - contrary to HTML - stores information on what words mean. For example, the XML text <seq>AGGTCCACCTWGGSCC</seq> represents a so-called element, consisting of an opening tag (<seq>), the content (AGGTCCACCTWGGSCC) and a closing tag (</seq>). The closing and opening tags give information on what the thing in between them actually is. Note that spaces after the opening tag or before the closing tag are of significance and are not automatically removed. Therefore <id> some_id </id> is not the same as <id>some_id</id>.

It is possible to nest elements within other elements. For example: the different properties of a SNP can be represented as follows:

  <snp>
    <pos>591</pos>
    <amb>R</amb>
    <sbe>
      <oligo>OL04-231</oligo>
      <specific>GAATACCAGCTACT</specific>
      <tail>TTTTTTTTTTTTTTTTTTTTTTTTTTTT</tail>
    </sbe>
    <remark>this is a remark about the SNP</remark>
  </snp>

Some guidelines should be followed for good practices:

The first line of an XML file always states the XML version:

<?xml version="1.0"?>
A XML file is called well-formed when all opening tags are closed, more particularly from in- to outside. For example: <tag1><tag2>text</tag1></tag2> is not well-formed, while <tag1><tag2>text</tag2></tag1> is.

MIPE

To be MIPE compliant, a well-formed XML file has to adhere to a set of rules as specified in the XSD file. Such a XML file is not only well-formed, but also valid. The XSD file sets rules like: For MIPE, the XSD file is called mipe.xsd. The path to the corresponding XSD file is set in the second line of the XML file itself, underneath the line with the XML version (see above). So a MIPE file should start with the following two lines (although the path to the XSD file should be set appropriately):
<?xml version="1.0"?>
<mipe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="http://mipe.sourceforge.net/mipe.xsd">
...

The order of elements in a MIPE compliant file has to be the same as specified in the XSD file. A description of all elements is presented here. Sorry, that's somewhat broken for the moment. Trying to export to HTML from Excel gives a great result...

According to the XSD file for MIPE, the outermost element - and there is only one - always is a <mipe> element (see Box 1).

Important: A XML file that doesn't comply to the rules in the XSD (i.e. that is not valid), is said not to be a MIPE file. The linux command xsdvalid your_file.mipe checks if the XML file complies to the corresponding XSD file. If something is wrong (most probably some element is missing or in the wrong place), this program reports the line number of the error.

An example MIPE compliant (or valid) file is shown in Box 1. The extreme minimal (and not really informative) MIPE file according to the XSD is represented in Box 2. A template file is available (i.e. template.mipe; be sure to change the second line to match the location of the mipe.xsd file).

Box 1: An example MIPE compliant file.
<?xml version="1.0"?>
<mipe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="http://mipe.sourceforge.net/mipe.xsd">
  <version>1.0</version>
  <pcr id="PCR1">
    <id>PCR1</id>
    <modified>20040426</modified>
    <modified>20040428</modified>
    <researcher>Jan Aerts</researcher>
    <species>chicken</species>
    <design>
      <source>
        <file>CYP2D6.fas</file>
      </source>
      <range>125-642</range>
      <seq>ACCTACTACTACAAACTACAACAAAATTCACATCAAAACATACACCATACCTACTACTAT...</seq>
      <primer1>
        <oligo>OL04-242</oligo>
      </primer1>
      <primer2>
        <oligo>OL04-243</oligo>
      </primer2>
    </design>
    <use>
      <seq>CACCATCACAGCTCACTATCGCCTGCGGGATCTCTCATTTACACAATTCGAGCTCACATCTATCATATCTAA...</seq>
      <revcomp>1</revcomp>
      <snp id="SCW0006">
        <id>SCW0006</id>
        <pos>45</pos>
        <amb>R</amb>
        <rank>3</rank>
      </snp>
    </use>
  </pcr>
</mipe>

Box 2: A minimal MIPE compliant file.
<?xml version="1.0"?>
<mipe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="http://mipe.sourceforge.net/mipe.sxd">
  <version>1.0</version>
</mipe>

Usage

To use the MIPE format, you don't have to do anything, as it is a format and not a program. You basically save the mipe.xsd file in a convenient place and get your data in a flat-text file (either by hand or by writing a script to access a database) and check that it complies with the rules set out in the mipe.xsd file. To do this on a linux/unix machine, you can type xsdvalid your_filname. Make sure to change the second line of your file to reflect the position where you mipe.xsd file is saved.

Accompagnying scripts

Getting things out of a MIPE file

Changing the contents of a MIPE file (not thorougly tested yet)

Example usage: suppose you have SNPs in a MIPE file with (Polyphred) ranks from 1 to 6, and want to keep only the ones with a rank < 4:
  1. mipe2snps.pl your_mipe_file.mipe > snp_list.csv
  2. Edit file to contain only the PCR product IDs and SNP IDs of the SNPs with rank >=4.
  3. removeSnpFromMipe.pl your_mipe_file.mipe < snp_list.csv

Wishlist

Contact

Jan Aerts, Roslin Institute
jan$DOT$aerts$AT$bbsrc$DOT$ac$DOT$uk
Last modified: July 20, 2005