連絡事項#

定期試験:
- 日時: 7 月 26 日金曜日 1 時限
- 追試験: 8 月 9 日金曜日 1 時限
アイデミー有料の G 検定対策講座ですが興味があれば著者特権を活用できるかも知れません
全人類がわかる統計学統計学，ディープラーニングなど無料コンテンツもあり
全脳アーキテクチャ若手の会意識高い系サークル活動。学外サークルをお考えの方に。刺激になります。
補講の相談
- 7 月 4 日木曜日 4 限(16:30-18:00) 井上先生の G 検定対策講座。種月館(3号館)8階3-806教室。強制ではありませんが出席者にはボーナスポイントとします
- もう一日どこか都合の良い日をご相談。井上先生の直後木曜日 5 限とか？

リカレントニューラルネットワーク#

系列情報処理 serial information processings: 観察された証拠から次に生じる事象を予想することは，

データ処理
生物の生存戦略
最適制御，天気予報，ロケットなどの弾道制御
未来予想，SF 的，心理学的，哲学的，歴史的意味あい。卑近な例では競馬予想，この会社の技術顧問は誰？，有る種の占い，経済予測，問題予測の例

などにとって意味がある。

他の場所で話した資料の抜粋#

6月28日本日はリカレントニューラルネットワーク，7月5日は計算論的意味論を予定

6 月 28 日#

Jordan, Elman
BPTT, BiRNN
LSTM, GRU
NLP の歴史
Language model, n-gram, NETtalk
Other modeling, AR, SSM, HMM

エルマンネットの簡単なデモ
- 2019komazawa_SRN_simulator
- 2019komazawa_keras_addtion_rnn
- __utils.py.zip https://drive.google.com/open?id=1vY4jcHe2JfqGICdwwDAI0gM2KJt8dFTL
- 勾配消失問題 vanishing gradient と勾配爆発問題 exploding gradient problems
文字ベースか単語ベースか？
1. Pros/Cons
2. OOV problems。OOV: Out of Vocabulary 問題。ソーシャルメディアなどを活用する場合不可避の問題
自然言語処理 (Natural Language Processing:NLP)
1. 統計的言語モデル statistical language model
  - Manning and Schutze (1999), Fundations of Statistical Natural Language Processing
  - Jurafsky and Martin Speech and Language Processing 改訂版が出版されました
言語モデル Language model
1. n-gram 言語モデル (Language model: LM)
2. 指標: BELU, BPE, perplexity
3. 課題: NER, POS, COL, Summary, QA, Translation
系列情報処理モデルには各分野で多くの試みがなされている。たとえば
1. 状態空間モデル (SSM), 隠れマルコフモデル (Hidden Markov models: HMM)
2. 自己回帰モデル (AR, ARMA, ARIMA, Box=Jenkins)
3. フィルタリング理論: カルマンフィルタ (Kalman filters), 粒子フィルタ(経済学部矢野浩一先生による粒子フィルタの解説論文)
4. ニューラルネットワーク

7 月 5 日#

word2vec, semnatic differential, LSA, LDA
seq2seq
attention
Transformer
BERT, ELMo
Multi-task learning

LSTM の概念 (Shumithuber ら 2015)を改変

双方向 RNN BiRNN#

Shuster (1997) Fig.1, Tab. 2

BERT の模式図 (Devlinら 2018) Fig. 1

リカレントニューラルネットワークの成果#

手書き文字認識 (Graves, 2009)
音声認識 (Graves 2013, 2014)
手書き文字生成 (Graves, 2013)
系列学習 (Sutskever, 2014)
機械翻訳 (Luong, 2015, Bahdanau, 2014)
画像脚注付け (Vinyals et. al, 2014, Kiros et al., 2014)
構文解析 (Vinayals et. al.. 2014)
プログラムコード生成 (Zaremba, 2015)
Machinge generated TED Talks

1955 年当時の人工知能課題#

1955年のダートマス会議で宣言された 7 つの人工知能問題

自動コンピュータ Automatic Computers
言語を使ったコンピュータプログラム How Can a Computer be Programmed to Use a Language
ニューロンネット Neuron Nets
計算サイズの理論 Theory of the Size of a Calculation
自己改善 Self-Improvement
抽象化 Abstractions
乱雑さと創造性 Randomness and Creativity

自然言語処理前史#

第一次ブーム極度の楽観論: 辞書を丸写しすれば翻訳は可能だと思っていた，らしい...
統計的自然言語処理 1980年代
Foundations of Statistical Natural Language Processing
by Christopher Manning and Hinrich Schütze.
https://nlp.stanford.edu/fsnlp/promo/
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition
by Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

ミコロフ革命#

Elman (1991) Fig. 1 を改変

古典的リカレントニューラルネットワーク#

図：マイケル・ジョーダン発案ジョーダンネット~\citep{1986Jordan}

図：ジェフ・エルマン発案のエルマンネット~\citep{Elman1990,Elman1993

師匠ジェフ・エルマン

リカレントニューラルネットワークの時間展開#

Time unfoldings of recurrent neural networks

Trajectories through state space for sentences boy chases boy, boy sees boy, boy walks. Principal component 1 is plotted along the abscissa; principal component 3 is plotted along the ordinate. These two PC’s together encode differences in verb-argument expectations.

Movement through state space for sentences with relative clauses. Principal component 1 is displayed along the abscissa; principal component 11 is displayed along the ordinate. These two PC’s encode depth of embedding in relative clauses.

長距離依存#

Schematic description of a long term dependency

リカレントニューラルネットワークの様々な入出力形態#

one-to-one: vanilla, one-to-many: image caption, many-to-one: sentiment analysis, many-to-many: machine translation, many-to-many: video classification

1 to 1 : $x_i \rightarrow y_i$ , vannila RNN
many to 1: $x_1, x_2, \cdots, x_n \rightarrow y_j$ , Image captioning
1 to many: $x_1 \rightarrow y_1, y_2, \cdots, y_n$ , sentiment analysis
many to many: $x_i \rightarrow y_i, x_{i+1}\rightarrow y_{i+1}$ , machine translation
many to many: $x_i, x_{i+1},\cdots,x_{i+k} \rightarrow y_{i+d}, y_{i+1+d},\cdots,y_{i+d+k}$ , video classification
many to many: $x_1\rightarrow y_1, x_2\rightarrow y_2, \cdots$

\cite{2010Mikolov2010}

Mikolov Extension

Boden's BPTT

NETtalk#

Sejnowski (1986) Fig. 2