Letter Frequency Analysis
Below are the results from having analyzed 14,442,711 words in the Open American National Corpus, a "subset of the ANC Second Release that is unrestricted in terms of usage and redistribution." It consists of both spoken and written data from a variety of sources.
Characters
Note: # and & represent digits and punctuation totals, respectively.
Relative Frequency
All Characters
-
a: 6.724%
b: 1.251%
c: 2.701%
d: 3.042%
e: 9.898%
f: 1.759%
g: 1.666%
h: 3.965%
i: 6.193%
j: 0.147%
k: 0.600%
l: 3.423%
m: 2.107%
n: 5.861%
o: 6.172%
p: 1.771%
q: 0.103%
r: 4.934%
s: 5.444%
t: 7.652%
u: 2.495%
v: 0.870%
w: 1.409%
x: 0.202%
y: 1.591%
z: 0.095%
: 13.882%
punct: 3.018%
digit: 1.027%
letters only
-
a: 8.193%
b: 1.524%
c: 3.291%
d: 3.706%
e: 12.059%
f: 2.143%
g: 2.030%
h: 4.832%
i: 7.546%
j: 0.179%
k: 0.732%
l: 4.170%
m: 2.567%
n: 7.142%
o: 7.520%
p: 2.158%
q: 0.125%
r: 6.012%
s: 6.633%
t: 9.324%
u: 3.040%
v: 1.061%
w: 1.717%
x: 0.246%
y: 1.938%
z: 0.115%
Punctuation
Digits
N-grams
Bigrams
-
th: 1,640,686
he: 1,364,267
in: 1,289,051
an: 1,019,992
re: 1,009,134
er: 1,005,070
on: 889,389
at: 865,851
en: 749,905
es: 732,170
Trigrams
-
the: 1,011,206
and: 467,688
ing: 436,634
ion: 350,254
ent: 287,032
tio: 284,023
hat: 262,531
tha: 248,246
ati: 215,976
for: 195,839