ነፃ ሓሳብ: Most frequent Fidel

The most frequent letter in Tigrinya corpora is “ን”.
Not surprisingly the most frequent one in Amharic is also “ን”.
Here a suitable python script:

# -*- coding: utf-8 -*-
import codecs
o = codecs.open(“o.txt”, “w”, encoding=”utf-8″)
f = codecs.open(“test_text.txt”, “r”, encoding=”utf-8″)
fidels = { “ሀ”.decode(“utf-8″):0, “ሁ”.decode(“utf-8″):0, “ሂ”.decode(“utf-8″):0, “ሃ”.decode(“utf-8″):0, … }

for i in f:
for j in i:
if j in fidels:
fidels[j] = fidels[j] + 1 #vorkommen des jew. fidels im text zaehlen
highest = 0
highest_fidel = “”
for i in fidels:
if fidels[i] > highest:
highest = fidels[i] #hoechstes finden, gibt es ein hoeheres -> ueberschreiben
highest_fidel = i
o.write(“most frequent Fidel: “)
o.write(highest_fidel)
o.write(str(highest))
second_highest = 0
second_highest_fidel = “”
fidels.pop(highest_fidel)
for i in fidels:
if fidels[i] > second_highest:
second_highest = fidels[i]
second_highest_fidel = i
o.write(“\n2nd most frequent Fidel: “)
o.write(second_highest_fidel)
o.write(str(second_highest))
third_highest = 0
third_highest_fidel = “”
fidels.pop(second_highest_fidel)
for i in fidels:
if fidels[i] > third_highest:
third_highest = fidels[i]
third_highest_fidel = i
o.write(“\n3rd most frequent Fidel: “)
o.write(third_highest_fidel)
o.write(str(third_highest))

Sample Output from a larger text (letter followed by occurrence):
most frequent Fidel: ን 118
2nd most frequent Fidel: ኣ 116
3rd most frequent Fidel: ብ 113

ነፃ ሓሳብ

Freitag

Most frequent Fidel

Keine Kommentare:

Kommentar veröffentlichen

ኣርእስትታት (Labels)