PyTorch 實戰 - 高鐵驗證碼辨識
Introduction
A CAPTCHA (/kæp.tʃə/, an acronym for “completely automated public Turing test to tell computers and humans apart”) is a type of challenge–response test used in computing to determine whether or not the user is human.
The term was coined in 2003 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. The most common type of CAPTCHA (displayed as Version 1.0) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test.
Wikipedia ── CAPTCHA
驗證碼的主要目的在於辨別人類與電腦,目前主流為圖形文字認證,也就是顯示一張背景干擾、文字扭曲的英數圖片,要求使用者填入圖片中的文字,並在比對確認無誤後才可進行接下來的操作(如留言、交易等)。在過往,由於影像的破損,使電腦因為無法從背景的雜訊中讀出這些字母而難以辨識,但受惠於硬體效能的提升與深度學習的崛起,使電腦得以透過機器學習來辨識驗證碼影像並帶有高準確率。目前主流的機器學習庫包含 PyTorch、Tensorflow(with Keras)等,在本文中,我們選擇 PyTorch 作為使用的機器學習庫,並以此建立一神經網路對台灣高鐵網站上的驗證碼進行訓練與辨識。