epithelial cell adhesion molecule (EPCAM) - coding DNA reference sequence

(used for variant description)

(last modified August 22, 2014)

This file was created to facilitate the description of sequence variants on transcript NM_002354.2 in the EPCAM gene based on a coding DNA reference sequence following the HGVS recommendations.

The sequence was taken from NG_012352.2, covering EPCAM transcript NM_002354.2.

Please note that introns are available by clicking on the exon numbers above the sequence.

 (upstream sequence)
           .         .         .         .         .                g.29043
   aactgcagcgccggggctgggggaggggagcctactcactcccccaactcccgggcgg       c.-301

 .         .         .         .         .         .                g.29103
 tgactcatcaacgagcaccagcggccagaggtgagcagtcccgggaaggggccgagaggc       c.-241

 .         .         .         .         .         .                g.29163
 ggggccgccaggtcgggcaggtgtgcgctccgccccgccgcgcgcacagagcgctagtcc       c.-181

 .         .         .         .         .         .                g.29223
 ttcggcgagcgagcaccttcgacgcggtccggggaccccctcgtcgctgtcctcccgacg       c.-121

 .         .         .         .         .         .                g.29283
 cggacccgcgtgccccaggcctcgcgctgcccggccggctcctcgtgtcccactcccggc       c.-61

 .         .         .         .         .         .                g.29343
 gcacgccctcccgcgagtcccgggcccctcccgcgcccctcttctcggcgcgcgcgcagc       c.-1

          .         .         .         .         .         .       g.29403
 ATGGCGCCCCCGCAGGTCCTCGCGTTCGGGCTTCTGCTTGCCGCGGCGACGGCGACTTTT       c.60
 M  A  P  P  Q  V  L  A  F  G  L  L  L  A  A  A  T  A  T  F         p.20

          .       | 02 .         .         .         .         .    g.33344
 GCCGCAGCTCAGGAAG | AATGTGTCTGTGAAAACTACAAGCTGGCCGTAAACTGCTTTGTG    c.120
 A  A  A  Q  E  E |   C  V  C  E  N  Y  K  L  A  V  N  C  F  V      p.40

          .         .         .         .         .         .       g.33404
 AATAATAATCGTCAATGCCAGTGTACTTCAGTTGGTGCACAAAATACTGTCATTTGCTCA       c.180
 N  N  N  R  Q  C  Q  C  T  S  V  G  A  Q  N  T  V  I  C  S         p.60

      | 03   .         .         .         .         .         .    g.33701
 AAGC | TGGCTGCCAAATGTTTGGTGATGAAGGCAGAAATGAATGGCTCAAAACTTGGGAGA    c.240
 K  L |   A  A  K  C  L  V  M  K  A  E  M  N  G  S  K  L  G  R      p.80

          .         .         .         .         .         .       g.33761
 AGAGCAAAACCTGAAGGGGCCCTCCAGAACAATGATGGGCTTTATGATCCTGACTGCGAT       c.300
 R  A  K  P  E  G  A  L  Q  N  N  D  G  L  Y  D  P  D  C  D         p.100

          .         .         .         .         .         .       g.33821
 GAGAGCGGGCTCTTTAAGGCCAAGCAGTGCAACGGCACCTCCATGTGCTGGTGTGTGAAC       c.360
 E  S  G  L  F  K  A  K  Q  C  N  G  T  S  M  C  W  C  V  N         p.120

          .         .         .         .         .         .       g.33881
 ACTGCTGGGGTCAGAAGAACAGACAAGGACACTGAAATAACCTGCTCTGAGCGAGTGAGA       c.420
 T  A  G  V  R  R  T  D  K  D  T  E  I  T  C  S  E  R  V  R         p.140

       | 04  .         .         .         .         .         .    g.35126
 ACCTA | CTGGATCATCATTGAACTAAAACACAAAGCAAGAGAAAAACCTTATGATAGTAAA    c.480
 T  Y  |  W  I  I  I  E  L  K  H  K  A  R  E  K  P  Y  D  S  K      p.160

          .  | 05      .         .         .         .         .    g.36900
 AGTTTGCGGAC | TGCACTTCAGAAGGAGATCACAACGCGTTATCAACTGGATCCAAAATTT    c.540
 S  L  R  T  |  A  L  Q  K  E  I  T  T  R  Y  Q  L  D  P  K  F      p.180

          .      | 06  .         .         .         .         .    g.38835
 ATCACGAGTATTTTG | TATGAGAATAATGTTATCACTATTGATCTGGTTCAAAATTCTTCT    c.600
 I  T  S  I  L   | Y  E  N  N  V  I  T  I  D  L  V  Q  N  S  S      p.200

          .         .         .         .         .        | 07.    g.39609
 CAAAAAACTCAGAATGATGTGGACATAGCTGATGTGGCTTATTATTTTGAAAAAGAT | GTT    c.660
 Q  K  T  Q  N  D  V  D  I  A  D  V  A  Y  Y  F  E  K  D   | V      p.220

          .         .         .         .         .         .       g.39669
 AAAGGTGAATCCTTGTTTCATTCTAAGAAAATGGACCTGACAGTAAATGGGGAACAACTG       c.720
 K  G  E  S  L  F  H  S  K  K  M  D  L  T  V  N  G  E  Q  L         p.240

          .         .         .         .         .         .       g.39729
 GATCTGGATCCTGGTCAAACTTTAATTTATTATGTTGATGAAAAAGCACCTGAATTCTCA       c.780
 D  L  D  P  G  Q  T  L  I  Y  Y  V  D  E  K  A  P  E  F  S         p.260

          .         .         .         .         .         .       g.39789
 ATGCAGGGTCTAAAAGCTGGTGTTATTGCTGTTATTGTGGTTGTGGTGATAGCAGTTGTT       c.840
 M  Q  G  L  K  A  G  V  I  A  V  I  V  V  V  V  I  A  V  V         p.280

          .         | 08         .         .         .         .    g.45045
 GCTGGAATTGTTGTGCTG | GTTATTTCCAGAAAGAAGAGAATGGCAAAGTATGAGAAGGCT    c.900
 A  G  I  V  V  L   | V  I  S  R  K  K  R  M  A  K  Y  E  K  A      p.300

     | 09    .         .         .         .                        g.46466
 GAG | ATAAAGGAGATGGGTGAGATGCATAGGGAACTCAATGCATAA                   c.945
 E   | I  K  E  M  G  E  M  H  R  E  L  N  A  X                     p.314

          .         .         .         .         .         .       g.46526
 ctatataatttgaagattatagaagaagggaaatagcaaatggacacaaattacaaatgt       c.*60

          .         .         .         .         .         .       g.46586
 gtgtgcgtgggacgaagacatctttgaaggtcatgagtttgttagtttaacatcatatat       c.*120

          .         .         .         .         .         .       g.46646
 ttgtaatagtgaaacctgtactcaaaatataagcagcttgaaactggctttaccaatctt       c.*180

          .         .         .         .         .         .       g.46706
 gaaatttgaccacaagtgtcttatatatgcagatctaatgtaaaatccagaacttggact       c.*240

          .         .         .         .         .         .       g.46766
 ccatcgttaaaattatttatgtgtaacattcaaatgtgtgcattaaatatgcttccacag       c.*300

          .         .         .         .         .         .       g.46826
 taaaatctgaaaaactgatttgtgattgaaagctgcctttctatttacttgagtcttgta       c.*360

          .         .         .         .         .                 g.46881
 catacatacttttttatgagctatgaaataaaacattttaaactgaatttcttaa            c.*415

 (downstream sequence)

Legend:
Nucleotide numbering (following the rules of the HGVS for a 'Coding DNA Reference Sequence') is indicated at the right of the sequence, counting the A of the ATG translation initiating Methionine as 1. Every 10^th nucleotide is indicated by a "." above the sequence. The Epithelial cell adhesion molecule protein sequence is shown below the coding DNA sequence, with numbering indicated at the right starting with 1 for the translation initiating Methionine. Every 10^th amino acid is shown in bold. The position of introns is indicated by a vertical line, splitting the two exons. The start of the first exon (transcription initiation site) is indicated by a '\', the end of the last exon (poly-A addition site) by a '/'. The exon number is indicated above the first nucleotide(s) of the exon. To aid the description of frame shift variants, all stop codons in the +1 frame are shown in bold while all stop codons in the +2 frame are underlined.