Letter Frequency Analysis

Below are the results from having analyzed 14,442,711 words in the Open American National Corpus, a "subset of the ANC Second Release that is unrestricted in terms of usage and redistribution." It consists of both spoken and written data from a variety of sources.

Characters

Note: # and & represent digits and punctuation totals, respectively.

Relative Frequency

All Characters

    a: 6.724%
    b: 1.251%
    c: 2.701%
    d: 3.042%
    e: 9.898%
    f: 1.759%
    g: 1.666%
    h: 3.965%
    i: 6.193%
    j: 0.147%
    k: 0.600%
    l: 3.423%
    m: 2.107%
    n: 5.861%
    o: 6.172%
    p: 1.771%
    q: 0.103%
    r: 4.934%
    s: 5.444%
    t: 7.652%
    u: 2.495%
    v: 0.870%
    w: 1.409%
    x: 0.202%
    y: 1.591%
    z: 0.095%
     : 13.882%
    punct: 3.018%
    digit: 1.027%

letters only

    a: 8.193%
    b: 1.524%
    c: 3.291%
    d: 3.706%
    e: 12.059%
    f: 2.143%
    g: 2.030%
    h: 4.832%
    i: 7.546%
    j: 0.179%
    k: 0.732%
    l: 4.170%
    m: 2.567%
    n: 7.142%
    o: 7.520%
    p: 2.158%
    q: 0.125%
    r: 6.012%
    s: 6.633%
    t: 9.324%
    u: 3.040%
    v: 1.061%
    w: 1.717%
    x: 0.246%
    y: 1.938%
    z: 0.115%

Punctuation

Digits

N-grams

Bigrams

    th: 1,640,686
    he: 1,364,267
    in: 1,289,051
    an: 1,019,992
    re: 1,009,134
    er: 1,005,070
    on: 889,389
    at: 865,851
    en: 749,905
    es: 732,170

Trigrams

    the: 1,011,206
    and: 467,688
    ing: 436,634
    ion: 350,254
    ent: 287,032
    tio: 284,023
    hat: 262,531
    tha: 248,246
    ati: 215,976
    for: 195,839