からっぽのしょこ

読んだら書く!書いたら読む!読書読読書読書♪同じ事は二度調べ(たく)ない

こぶつば楽曲の歌詞をテキスト分類したい①~分析データの整形と確認~

〇はじめに

 以前の記事でデータの加工をほとんどせずにテキスト分類を行ったところ、結果はお察しでした(一応この記事これです)。あれからひと月が経ち多少知識も増えたので再挑戦しました。
 いくつかの記事に分けて、データ整形、特徴語の選択、クラスタリング、カテゴライズ、ランダムフォレスト等々をしていきます。最初はテキストデータの整形と確認です。

 歌詞をそのまま比較することは難しいので、始めにMeCabを使って形態素解析を行います。今回は文字レベルと単語(形態素)レベルに切り分けました。
 次に、各文字・単語がどれだけ文書中に出現するのかを調べます。実際に使われている数を観測頻度といい、文書間で比較しやすいように調整した数を相対頻度(今回は1テキストの総語数が1000語として調整しています)といいます。また、その単語がどれだけ文書を特徴づけているのかに注目したTF-IDFという値もあります。
 高頻度で出現する文字・単語を特徴語として今後分析で用います。今回は各グループでの上位50・100・200種類を合わせて特徴語としました。

 2グループの歌詞を機械学習の手法を用いて分類することで、グループ間の差を客観的に示すことと、歌詞分類における効果的な特徴語の抽出方法を見つけることを目的としています。

分析データ

ハロプロで同時期に結成された2グループ「こぶしファクトリー」と「つばきファクトリー」の歌詞を扱います。
分析中の各楽曲のタイトルは省略して下のように表記しています。

  • kbs_s1a:こぶしファクトリー(kbs),1stシングル(s1),A面(a)
  • tbk_a1f:つばきファクトリー(tbk),1stアルバム(a1),6トラック目(f)

歌詞タイトル一覧(クリックで展開)

コード タイトル グループ 作品タイプ
kbs_a1a 急がば回れ こぶしファクトリー アルバム
kbs_a1e 未熟半熟トロトロ こぶしファクトリー アルバム
kbs_a1g 懸命ブルース こぶしファクトリー アルバム
kbs_a1i 残心 こぶしファクトリー アルバム
kbs_a1l TEKI こぶしファクトリー アルバム
kbs_a1o GO TO THE TOP!! こぶしファクトリー アルバム
kbs_a1q 辛夷の花 こぶしファクトリー アルバム
kbs_s0a 念には念 こぶしファクトリー シングル
kbs_s0b サバイバー こぶしファクトリー シングル
kbs_s1a ドスコイ!ケンキョにダイタン こぶしファクトリー シングル
kbs_s2a 桜ナイトフィーバー こぶしファクトリー シングル
kbs_s2b チョット愚直に!猪突猛進 こぶしファクトリー シングル
kbs_s2c 押忍!こぶし魂 こぶしファクトリー シングル
kbs_s3a サンバ!こぶしジャネイロ こぶしファクトリー シングル
kbs_s3b バッチ来い青春! こぶしファクトリー シングル
kbs_s4a シャララ!やれるはずさ こぶしファクトリー シングル
kbs_s5a これからだ! こぶしファクトリー シングル
kbs_s5b 明日テンキになあれ こぶしファクトリー シングル
kbs_s6a きっと私は こぶしファクトリー シングル
kbs_s6b ナセバナル こぶしファクトリー シングル
tbk_a1b 表面張力~Surface Tension~ つばきファクトリー アルバム
tbk_a1f 可能性のコンチェルト つばきファクトリー アルバム
tbk_a1i 帰ろう レッツゴー! つばきファクトリー アルバム
tbk_a1m 雪のプラネタリウム つばきファクトリー アルバム
tbk_a1q ハッピークラッカー つばきファクトリー アルバム
tbk_a1r ふりさけみれば… つばきファクトリー アルバム
tbk_s0a 青春まんまんなか! つばきファクトリー シングル
tbk_s0b 気高く咲き誇れ! つばきファクトリー シングル
tbk_s0c 独り占め つばきファクトリー シングル
tbk_s1a 初恋サンライズ つばきファクトリー シングル
tbk_s1b Just Try! つばきファクトリー シングル
tbk_s1c うるわしのカメリア つばきファクトリー シングル
tbk_s2a 就活センセーション つばきファクトリー シングル
tbk_s2b 笑って つばきファクトリー シングル
tbk_s2c ハナモヨウ つばきファクトリー シングル
tbk_s3a 低温火傷 つばきファクトリー シングル
tbk_s3b 春恋歌 つばきファクトリー シングル
tbk_s3c I Need You ~夜空の観覧車~ つばきファクトリー シングル
tbk_s4a デートの日は二度くらいシャワーして出かけたい つばきファクトリー シングル
tbk_s4b 純情cm つばきファクトリー シングル
tbk_s4c 今夜だけ浮かれたかった つばきファクトリー シングル


 歌詞そのものは載せられないので頻度表を載せておきます。少し手を加えていますが基本的に下の処理コードのfreqency_table(2じゃない方) になります。「original」となっているのはRMeCabの結果にファイル名列を加えて下にくっ付けていったもの。「tfidf_original」は結果そのままです。前半の方は特徴語の選択用、後半の「original」は歌詞テキストの代用と想定して用意しました。
freqency_table_kbtb - Google スプレッドシート

主な参考書籍

『Rによるやさしいテキストマイニング[機械学習編]』小林雄一郎,オーム社

〇分析データの整形と確認

・特徴語の抽出

使用パッケージ

library(RMeCab) #docDF(),docMatrix()
library(magrittr) #%>%
library(dplyr) #filter(),grepl()

準備

#ファイルを指定
folder_name1 <- "フォルダ名"
file_name1   <- list.files(folder_name1)
file_path1   <- paste(folder_name1, file_name1, sep = "/")

処理

tokens_types   <- data.frame() #各文書の総語数と異語数表
freqency_table <- data.frame() #頻度表
for(i in 1:length(file_path1)) {
  #形態素解析
  tmp_freqency_table <- docDF(file_path1[i])
  colnames(tmp_freqency_table) <- c("TERM", "FREQ")
  tmp2 <- tmp_freqency_table %>% filter(!grepl("\\s", TERM)) #doc()で空白が紛れ込むので取り除く
  
  #文書別の総語数,異語数を集計
  tokens <- sum(tmp2$FREQ) #総語数
  types  <- nrow(tmp2)     #異語数
  tmp_tt <- data.frame(TOKENS = tokens, TYPES = types)
  rownames(tmp_tt) <- file_name1[i] #行名をテキスト名にする
  tokens_types <- rbind(tokens_types, tmp_tt) #各文書の総語数と異語数を記録
  
  #頻度表の作成
  tmp2$R_FREQ <- tmp2$FREQ / tokens * 1000 #各文書での相対頻度を加える
  freqency_table <- rbind(freqency_table, tmp2) #頻度表に加える
}
freqency_table2 <- freqency_table %>% 
                   group_by(TERM) %>% #重複語の統合
                   summarise(FREQ = sum(FREQ), R_FREQ = sum(R_FREQ)) %>% #統合時に頻度を加算
                   arrange(desc(R_FREQ)) #上位語を抜き出すために降順に並び替え
  
#グループごとの総語数,異語数を統合
tokens_all <- sum(freqency_table2$FREQ)
types_all  <- nrow(freqency_table2)
tokens_types_all <- data.frame(TOKENS = tokens_all, TYPES = types_all)

これはテキストを文字レベルで扱うコードになっているので、単語レベルで行うにはdocDF()の引数にtype = 1とする。また単語の品詞を限定するには、同じくdocDF()の引数にpos = c("名詞", "動詞", "形容詞", "動詞")等欲しい品詞を追加する。
今の状態では相対頻度で上位語を選ぶようになっているので、観測頻度にするにはarrange(desc(R_FREQ)) で並べ替える基準として設定している列をR_FREQからFREQにする。両方の頻度を残しておく必要がなければtmp2$R_FREQ <- tmp2$FREQ / tokens * 1000の部分をtmp2$FREQとしてR_FREQ列を上書きしてしまう。観測頻度が欲しい時はその行をコメントアウトすればよい。 また、単語の場合は品詞情報が付いて返ってくるので、下のように処理に手を加える。

#形態素解析
tmp_freqency_table <- docDF(file_path1[i])
colnames(tmp_freqency_table) <- c("TERM", "POS1", "POS2", "FREQ")
tmp2 <- tmp_freqency_table %>% 
        filter(!grepl("\\s", TERM)) %>%  #doc()で空白が紛れ込むので取り除く 
        select(TERM, FREQ) %>%           #品詞は使わない
        group_by(TERM) %>%               #別の品詞として集計された同単語を統合
        summarise(FREQ = sum(FREQ))      #統合時に頻度を加算


TF-IDFで行う場合は処理が大きく変わる。

#形態素解析
tmp <- docMatrix(folder_name1, weight = "tf*idf*norm")

#頻度表の作成
freqency_table <- data.frame()
for (i in 1:length(file_name1)) {
  tmp1 <- data.frame(rownames(tmp), tmp[, i]) #行名のtermと1列目から順番に頻度を組み合わせて1文書分のdfを作成
  rownames(tmp1) <- NULL #行名も付随するようなので削除
  colnames(tmp1) <- c("TERM", "FREQ") #列名を統一
  freqency_table <- rbind(freqency_table, tmp1) #頻度表に統合
}
freqency_table2 <- freqency_table %>% 
                   group_by(TERM) %>% 
                   summarise(FREQ = sum(FREQ)) %>% 
                   arrange(desc(FREQ))

docMatrix()の引数にweight = "tf*idf*norm"を設定することで値がTF-IDFで返ってくる。

for()の部分はテーブルクラスからデータフレームに変換できなかったからの応急処置なのですが、誰かスムーズなやり方を教えてください…

完成した頻度表(freqency_table2)は別オブジェクトに移して、2グループ目の処理を行う(それぞれkbs、tbkとしました)。続いて、特徴語の選択基準を決める。

#特徴語を選出する
term_5050 <- full_join(head(kbs, 50), head(tbk, 50), by = "TERM")
term_100100 <- full_join(head(kbs, 100), head(tbk, 100), by = "TERM")
term_200200 <- full_join(head(kbs, 200), head(tbk, 200), by = "TERM")

full_join()を使って引数by = "TERM"で設定したTERM列に含まれる語をマッチさせてデータフレームを統合する。重複する単語だけを抜き出すのならinner_join()を使う。
(下でFREQ.xやR_FREQ.yというのが出てきますが、この時にできたものの名残です。くっつける時に同じ列名があるとそうなるようです。)

この基準には何の根拠もないですよ。ご自由に設定してください。両グループから等しい数を合わせて100語・200語とかできれば良い気がしますね。

特徴語の抽出作業は以上です。次からはその特徴語を確認していきます。

・特徴語を確認する

グループごとの総文字数と文字種類
> tokens_types_all
##     TOKENS TYPES
## kbs  10854   833
## tbk  11399   891

上の処理の中で、TOKENSを総語数でTYPESを異語数としている。
(注:こぶしは20曲、つばきは21曲です。)

曲ごとの総文字数と文字種類

(クリックで展開)

> tokens_types
##             TOKENS TYPES
## kbs_a1a.txt    430   124
## kbs_a1e.txt    617   154
## kbs_a1g.txt    607   134
## kbs_a1i.txt    508   175
## kbs_a1l.txt    313   114
## kbs_a1o.txt    583   210
## kbs_a1q.txt    621   156
## kbs_s0a.txt    430   144
## kbs_s0b.txt    339    88
## kbs_s1a.txt    605   190
## kbs_s2a.txt    615   156
## kbs_s2b.txt    567   167
## kbs_s2c.txt    602   131
## kbs_s3a.txt    669   193
## kbs_s3b.txt    647   174
## kbs_s4a.txt    487   124
## kbs_s5a.txt    714   211
## kbs_s5b.txt    434   105
## kbs_s6a.txt    717   150
## kbs_s6b.txt    349   121
## tbk_a1b.txt    625   160
## tbk_a1f.txt    488   153
## tbk_a1i.txt    573   140
## tbk_a1m.txt    430   164
## tbk_a1q.txt    612   190
## tbk_a1r.txt    303   116
## tbk_s0a.txt    359   109
## tbk_s0b.txt    724   190
## tbk_s0c.txt    265    95
## tbk_s1a.txt    772   202
## tbk_s1b.txt    771   145
## tbk_s1c.txt    666   199
## tbk_s2a.txt    896   180
## tbk_s2b.txt    418   110
## tbk_s2c.txt    333   128
## tbk_s3a.txt    508   162
## tbk_s3b.txt    467   173
## tbk_s3c.txt    530   177
## tbk_s4a.txt    488   138
## tbk_s4b.txt    543   186
## tbk_s4c.txt    628   133


相対頻度による特徴語

使用されている文字を文書ごとに相対頻度で集計する。それをグループごとに足し合わせる。グループごとの上位の文字50種、100種、200種を合わせて特徴語としたもの。重複する文字もあるため、特徴語はそれぞれ56種、129種、258種だった。

> term_200200$TERM
##   [1] "い" "な" "て" "る" "の" "っ" "に" "か" "れ" "も" "た" "は" "し" "が"
##  [15] "う" "!" "と" "ら" "こ" "ん" "だ" "を" "け" "で" "a"  "く" "バ" "ま"
##  [29] "ー" "り" "ゃ" "き" "え" "さ" "e"  "あ" "つ" "l"  "チ" "ち" "よ" "そ"
##  [43] "ッ" "す" "み" "じ" "("  ")"  "イ" "ど" "ン" "め" "来" "や" "わ" "一"
##  [57] "人" "ば" "ャ" "ト" "h"  "ル" "何" "T"  "?" "生" "ろ" "ナ" "回" "出"
##  [71] "n"  "時" "ず" "ブ" "サ" "今" "ね" "S"  "ョ" "見" "ラ" "ぶ" "心" "ィ"
##  [85] "私" "ス" "せ" "O"  "げ" "ぐ" "ギ" "前" "ひ" "上" "大" "自" "お" "分"
##  [99] "g"  "敵" "気" "u"  "誰" "B"  "ア" "×" "カ" "涙" "プ" "…" "セ" "立"
## [113] "行" "突" "不" "s"  "目" "ド" "中" "ょ" "フ" "間" "リ" "春" "切" "ぬ"
## [127] "日" "事" "び" "E"  "命" "「" "」" "空" "全" "o"  "無" "y"  "向" "世"
## [141] "コ" "笑" "道" "ロ" "."  "力" "負" "青" "真" "ゆ" "信" "キ" "残" "Y" 
## [155] "ム" "拳" "ぜ" "君" "D"  "、" "体" "夢" "絶" "マ" "2"  "熱" "思" "度"
## [169] "む" "ジ" "続" "直" "良" "I"  "進" "雨" "知" "ニ" "未" "後" "テ" "タ"
## [183] "手" "独" "安" "へ" "懸" "ウ" "明" "シ" "H"  "対" "c"  "流" "N"  "勝"
## [197] "C"  "動" "r"  "子" "言" "恋" "t"  "夜" "愛" "咲" "花" "ク" "ぎ" "メ"
## [211] "帰" "会" "デ" "L"  "好" "ほ" "持" "界" "彼" "感" "女" "J"  "ご" "ダ"
## [225] "づ" "楽" "ハ" "想" "願" "レ" "A"  "ふ" "少" "浮" "ホ" "情" "ぁ" "着"
## [239] "強" "傷" "ズ" "抱" "雪" "i"  "べ" "d"  "声" "純" "色" "ぇ" "泣" "飛"
## [253] "御" "ネ" "白" "風" "様" "遠"

この文字列の前から56種が両グループから50種ずつ取った時の56種という訳ではありません。詳細は下の「特徴語のデータフレーム作成コード」をご覧ください。

データフレーム作成コード(クリックで展開) 一応載せておきます。

#上位50語ずつの場合の特徴語頻度表
term_5050 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "や", "ラ", "せ", "気", "ろ", "め"), 
                        FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, NA, NA, NA, NA, NA, NA), 
                        R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, NA, NA, NA, NA, NA, NA), 
                        FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, NA, 149, 75, 140, 113, 69, NA, NA, 122, NA, 57, 62, 76, 87, NA, NA, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, 75, 73, 62, 56, 55, 50), 
                        R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, NA, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, NA, 220.231329, NA, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, NA, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, 123.8192361, 138.7491174, 106.8245668, 95.3081965, 87.72956774, 90.29319389))
#上位100語ずつの場合の特徴語頻度表
term_100100 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "来", "ャ", "や", "め", "イ", "人", "一", "ト", "わ", "h", "T", "ば", "出", "ろ", "?", "n", "ル", "何", "今", "時", "生", "ね", "ョ", "ず", "回", "S", "ィ", "せ", "O", "ぶ", "ラ", "ス", "ナ", "見", "げ", "サ", "私", "×", "g", "ブ", "心", "大", "B", "u", "カ", "ぐ", "気", "前", "s", "ひ", "、", "…", "君", "日", "t", "言", "r", "笑", "「", "」", "お", "y", "夜", "咲", "恋", "シ", "び", "キ", "ク", "メ", "分", "o", "花", "帰", "世", "へ"), 
                          FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, 47, 45, 45, 44, 42, 42, 39, 37, 37, 36, 36, 34, 33, 32, 31, 31, 31, 31, 29, 28, 28, 27, 27, 26, 26, 25, 25, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 19, 19, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, 79.12743283, 71.27035288, 77.59978007, 85.28838092, 90.43662217, 71.59751976, 72.40261105, 68.148304, 74.91351201, 65.4789059, 58.954233, 71.31782931, 53.86222166, 58.08224249, 58.41325732, 52.02600074, 62.42502266, 62.22936646, 50.14195053, 51.85096739, 58.14074433, 49.4422164, 46.59013625, 51.72511044, 57.23894735, 48.91947427, 42.2997285, 40.33799598, 39.40557098, 42.95740526, 43.62146719, 41.15721413, 57.55381584, 45.58799803, 39.10887374, 51.10256855, 41.97116128, 32.71677966, 34.97811628, 51.12209201, 42.74551725, 35.69509045, 32.88802204, 33.5081267, 32.56692798, 38.80977734, 33.92929232, 37.64030544, 30.37174304, 36.91866163, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, 31, 149, 75, 140, 113, 69, NA, 43, 122, NA, 57, 62, 76, 87, NA, 40, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, NA, NA, 75, 50, 46, 25, 27, 19, 43, NA, 21, 34, 41, 55, 46, 20, 30, 21, 43, NA, NA, 41, NA, 26, NA, NA, NA, 62, NA, NA, 73, 22, NA, 35, NA, NA, 34, NA, NA, NA, NA, NA, NA, 37, 31, NA, 56, NA, 28, NA, 47, 44, 44, 40, 39, 37, 36, 33, 31, 31, 31, 30, 30, 27, 26, 25, 25, 21, 21, 21, 21, 20, 20, 20, 20, 19), 
                          R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, 49.61476049, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, 66.25204263, 220.231329, NA, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, 69.84779183, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, NA, NA, 123.8192361, 90.29319389, 89.34048558, 50.09046122, 41.1881607, 40.47853289, 75.01121244, NA, 28.89940202, 59.20804398, 70.70177059, 87.72956774, 88.39993293, 31.90247055, 54.81879517, 33.44177935, 73.57396495, NA, NA, 82.53927874, NA, 46.67282583, NA, NA, NA, 106.8245668, NA, NA, 138.7491174, 49.98051328, NA, 61.78987069, NA, NA, 74.06740201, NA, NA, NA, NA, NA, NA, 53.53230157, 56.50844779, NA, 95.3081965, NA, 37.30962014, NA, 89.65227764, 83.32614634, 86.86689295, 65.29408289, 52.60530236, 68.04817262, 49.62320119, 75.77836711, 57.43548019, 57.43548019, 48.11733014, 42.52828371, 50.35225142, 39.53670609, 53.56515511, 49.51437239, 44.25181635, 45.8886498, 39.03673535, 35.59385895, 41.1169574, 30.30335625, 39.365075, 35.21153654, 38.04655178, 30.14668171))
#上位200語ずつの場合の特徴語頻度表
term_200200 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "来", "ャ", "や", "め", "イ", "人", "一", "ト", "わ", "h", "T", "ば", "出", "ろ", "?", "n", "ル", "何", "今", "時", "生", "ね", "ョ", "ず", "回", "S", "ィ", "せ", "O", "ぶ", "ラ", "ス", "ナ", "見", "げ", "サ", "私", "×", "g", "ブ", "心", "大", "B", "u", "カ", "ぐ", "気", "前", "s", "ひ", "お", "フ", "間", "春", "上", "…", "E", "ド", "ょ", "行", "自", "突", "分", "目", "立", "涙", "o", "y", "ア", "プ", "リ", "誰", "中", "日", "Y", "ギ", "コ", "向", "事", "世", "切", "不", "命", "「", "」", "D", "び", "ロ", "空", "笑", "真", "青", "全", "無", ".", "2", "I", "キ", "セ", "ぜ", "ぬ", "君", "度", "熱", "力", "、", "ジ", "ニ", "マ", "ゆ", "信", "続", "直", "敵", "道", "負", "夢", "H", "N", "ウ", "シ", "雨", "懸", "拳", "後", "思", "勝", "体", "未", "1", "c", "C", "r", "W", "タ", "テ", "ム", "む", "叫", "手", "進", "絶", "張", "年", "派", "明", "流", "3", "エ", "へ", "t", "言", "夜", "咲", "恋", "ク", "メ", "花", "帰", "J", "L", "デ", "愛", "会", "好", "子", "ぎ", "ほ", "願", "持", "界", "御", "女", "i", "ぁ", "ダ", "楽", "社", "杯", "A", "ズ", "づ", "レ", "感", "少", "致", "浮", "ご", "ハ", "ホ", "彼", "本", "d", "強", "情", "色", "想", "着", "飛", "方", "ふ", "べ", "泣", "純", "傷", "尽", "知"), 
                          FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, 47, 45, 45, 44, 42, 42, 39, 37, 37, 36, 36, 34, 33, 32, 31, 31, 31, 31, 29, 28, 28, 27, 27, 26, 26, 25, 25, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, 79.12743283, 71.27035288, 77.59978007, 85.28838092, 90.43662217, 71.59751976, 72.40261105, 68.148304, 74.91351201, 65.4789059, 58.954233, 71.31782931, 53.86222166, 58.08224249, 58.41325732, 52.02600074, 62.42502266, 62.22936646, 50.14195053, 51.85096739, 58.14074433, 49.4422164, 46.59013625, 51.72511044, 57.23894735, 48.91947427, 42.2997285, 40.33799598, 39.40557098, 42.95740526, 43.62146719, 41.15721413, 57.55381584, 45.58799803, 39.10887374, 51.10256855, 41.97116128, 32.71677966, 34.97811628, 51.12209201, 42.74551725, 35.69509045, 32.88802204, 33.5081267, 32.56692798, 38.80977734, 33.92929232, 37.64030544, 30.37174304, 36.91866163, 35.17124973, 29.26829268, 29.2070625, 28.89332215, 36.30902057, 31.48001077, 27.20942434, 29.89536531, 29.63507009, 30.98903071, 35.23350821, 30.85836796, 35.07709338, 30.05135353, 31.38857152, 32.04007054, 25.78315529, 25.30295899, 32.79290975, 31.99693475, 29.10656338, 33.18610323, 29.68850407, 28.338832, 22.74781468, 38.57528357, 25.02189689, 25.21905962, 27.96276671, 25.09285605, 28.85070929, 30.83723019, 26.63097615, 26.42798253, 26.42798253, 21.40054876, 27.65823401, 23.65236133, 26.20664696, 24.59332185, 22.98289282, 23.22642077, 26.18334865, 25.34389794, 23.59554051, 20.75863317, 19.79094587, 22.79783758, 31.45874182, 21.7672411, 28.49427575, 21.63150352, 20.26931816, 20.73205753, 23.36888284, 21.39867445, 20.18762184, 18.92615707, 20.87915473, 22.97095614, 22.8181508, 19.95830157, 19.88540613, 34.74799574, 24.26058893, 23.26756503, 21.11719581, 17.74241662, 17.20718289, 17.99389177, 17.81292802, 19.46757166, 18.0370523, 21.77806826, 18.71800875, 20.41904207, 17.18991417, 21.3336497, 18.73630542, 15.15099008, 17.39189691, 17.12250832, 16.75039885, 15.85126293, 18.51894182, 18.53409239, 22.0144909, 20.26354731, 14.85456228, 18.49625214, 19.49906311, 20.91321372, 16.43034437, 15.84330783, 14.80942467, 17.92277547, 17.3208933, 13.4529148, 14.58685948, 18.0561854, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, 31, 149, 75, 140, 113, 69, NA, 43, 122, 13, 57, 62, 76, 87, NA, 40, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, NA, NA, 75, 50, 46, 25, 27, 19, 43, 16, 21, 34, 41, 55, 46, 20, 30, 21, 43, 12, 14, 41, 13, 26, 9, NA, NA, 62, 14, 9, 73, 22, NA, 35, 9, 12, 34, NA, NA, NA, 18, 10, NA, 37, 31, 10, 56, NA, 28, 13, 31, NA, 18, 12, 14, 44, NA, 13, 18, 15, 18, NA, 21, 12, NA, 9, 20, 30, 14, 16, 19, 18, 18, 40, 9, NA, 9, NA, NA, 20, 9, 16, NA, 31, 31, NA, 25, 11, 17, 33, 12, NA, NA, 12, 9, NA, NA, 21, 14, NA, NA, 44, NA, 12, NA, 47, 11, NA, 13, 10, NA, NA, NA, NA, NA, NA, 10, NA, NA, 15, 25, NA, NA, NA, NA, 17, NA, NA, NA, NA, 13, NA, 36, 12, 16, NA, 11, 9, NA, 9, NA, NA, NA, NA, NA, 9, NA, NA, NA, 19, 39, 37, 30, 27, 26, 21, 21, 20, 20, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9), 
                          R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, 49.61476049, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, 66.25204263, 220.231329, 21.38817475, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, 69.84779183, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, NA, NA, 123.8192361, 90.29319389, 89.34048558, 50.09046122, 41.1881607, 40.47853289, 75.01121244, 27.27791026, 28.89940202, 59.20804398, 70.70177059, 87.72956774, 88.39993293, 31.90247055, 54.81879517, 33.44177935, 73.57396495, 19.70680306, 28.23678726, 82.53927874, 22.14856453, 46.67282583, 16.73268513, NA, NA, 106.8245668, 22.33684281, 13.86154919, 138.7491174, 49.98051328, NA, 61.78987069, 18.47293261, 17.69320471, 74.06740201, NA, NA, NA, 36.93369398, 19.23772754, NA, 53.53230157, 56.50844779, 17.69576844, 95.3081965, NA, 37.30962014, 22.39973072, 48.11733014, NA, 31.43143997, 25.97398706, 27.42808658, 83.32614634, NA, 25.86598964, 33.97302186, 28.27469531, 36.80087323, NA, 41.1169574, 17.50699066, NA, 19.33020786, 30.30335625, 42.52828371, 24.75061888, 33.36514322, 35.88857917, 32.18667783, 33.3769214, 65.29408289, 12.68406195, NA, 18.34452326, NA, NA, 38.04655178, 19.37271354, 27.17906441, NA, 57.43548019, 57.43548019, NA, 44.25181635, 22.79982695, 28.68254176, 75.77836711, 22.07915931, NA, NA, 22.00220013, 14.668969, NA, NA, 45.8886498, 17.52218787, NA, NA, 86.86689295, NA, 21.02437746, NA, 89.65227764, 22.47824471, NA, 27.84245052, 20.10594474, NA, NA, NA, NA, NA, NA, 17.77976648, NA, NA, 26.73676136, 49.51437239, NA, NA, NA, NA, 30.03601035, NA, NA, NA, NA, 20.36789664, NA, 49.62320119, 15.56420233, 32.76920521, NA, 23.36252633, 15.86648011, NA, 23.24425468, NA, NA, NA, NA, NA, 12.51967783, NA, NA, NA, 30.14668171, 52.60530236, 68.04817262, 50.35225142, 39.53670609, 53.56515511, 39.03673535, 35.59385895, 39.365075, 35.21153654, 23.3463035, 29.13920437, 30.51780623, 41.21203249, 31.84555849, 29.11519579, 26.86341784, 36.025014, 28.808304, 21.10646836, 27.47966736, 27.24360243, 16.73597927, 23.37112253, 18.0968826, 19.07913631, 22.48014632, 21.85136622, 14.50892857, 14.98521383, 20.0175329, 18.55183505, 22.13372861, 20.67162289, 25.96286253, 19.80437309, 13.39285714, 19.566396, 22.69302018, 21.44326646, 19.46363916, 27.0262534, 14.79984624, 17.6038969, 18.62540665, 19.07982673, 17.15487639, 21.30014836, 18.96311923, 17.02361874, 14.13219733, 19.93943516, 18.06218348, 17.13958809, 17.30532297, 18.60175505, 13.96041487, 17.2992565))


観測頻度による特徴語

観測頻度の場合は特徴語がそれぞれ56種、126種、257種となった。相対頻度の場合と比較すると、観測頻度では上位となった文字が"1","W","叫","張","年","派","3","エ","社","杯","致","本","方","尽"で、逆に観測頻度では"文"残","良","独","安","対","動","抱","雪","声","ぇ","ネ","白","風","様","遠"が特徴語とならなかった。

特徴語のデータフレーム作成コード(クリックで展開)

#上位50語ずつの場合の特徴語頻度表
term_5050 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "や", "ラ", "せ", "気", "ろ", "め"), 
                        FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, NA, NA, NA, NA, NA, NA), 
                        R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, NA, NA, NA, NA, NA, NA), 
                        FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, NA, 149, 75, 140, 113, 69, NA, NA, 122, NA, 57, 62, 76, 87, NA, NA, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, 75, 73, 62, 56, 55, 50), 
                        R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, NA, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, NA, 220.231329, NA, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, NA, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, 123.8192361, 138.7491174, 106.8245668, 95.3081965, 87.72956774, 90.29319389))
#上位100語ずつの場合の特徴語頻度表
term_100100 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "来", "ャ", "や", "め", "イ", "人", "一", "ト", "わ", "h", "T", "ば", "出", "ろ", "?", "n", "ル", "何", "今", "時", "生", "ね", "ョ", "ず", "回", "S", "ィ", "せ", "O", "ぶ", "ラ", "ス", "ナ", "見", "げ", "サ", "私", "×", "g", "ブ", "心", "大", "B", "u", "カ", "ぐ", "気", "前", "s", "ひ", "、", "…", "君", "日", "t", "言", "r", "笑", "「", "」", "お", "y", "夜", "咲", "恋", "シ", "び", "キ", "ク", "メ", "分", "o", "花", "帰", "世", "へ"), 
                          FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, 47, 45, 45, 44, 42, 42, 39, 37, 37, 36, 36, 34, 33, 32, 31, 31, 31, 31, 29, 28, 28, 27, 27, 26, 26, 25, 25, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 19, 19, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, 79.12743283, 71.27035288, 77.59978007, 85.28838092, 90.43662217, 71.59751976, 72.40261105, 68.148304, 74.91351201, 65.4789059, 58.954233, 71.31782931, 53.86222166, 58.08224249, 58.41325732, 52.02600074, 62.42502266, 62.22936646, 50.14195053, 51.85096739, 58.14074433, 49.4422164, 46.59013625, 51.72511044, 57.23894735, 48.91947427, 42.2997285, 40.33799598, 39.40557098, 42.95740526, 43.62146719, 41.15721413, 57.55381584, 45.58799803, 39.10887374, 51.10256855, 41.97116128, 32.71677966, 34.97811628, 51.12209201, 42.74551725, 35.69509045, 32.88802204, 33.5081267, 32.56692798, 38.80977734, 33.92929232, 37.64030544, 30.37174304, 36.91866163, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, 31, 149, 75, 140, 113, 69, NA, 43, 122, NA, 57, 62, 76, 87, NA, 40, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, NA, NA, 75, 50, 46, 25, 27, 19, 43, NA, 21, 34, 41, 55, 46, 20, 30, 21, 43, NA, NA, 41, NA, 26, NA, NA, NA, 62, NA, NA, 73, 22, NA, 35, NA, NA, 34, NA, NA, NA, NA, NA, NA, 37, 31, NA, 56, NA, 28, NA, 47, 44, 44, 40, 39, 37, 36, 33, 31, 31, 31, 30, 30, 27, 26, 25, 25, 21, 21, 21, 21, 20, 20, 20, 20, 19), 
                          R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, 49.61476049, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, 66.25204263, 220.231329, NA, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, 69.84779183, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, NA, NA, 123.8192361, 90.29319389, 89.34048558, 50.09046122, 41.1881607, 40.47853289, 75.01121244, NA, 28.89940202, 59.20804398, 70.70177059, 87.72956774, 88.39993293, 31.90247055, 54.81879517, 33.44177935, 73.57396495, NA, NA, 82.53927874, NA, 46.67282583, NA, NA, NA, 106.8245668, NA, NA, 138.7491174, 49.98051328, NA, 61.78987069, NA, NA, 74.06740201, NA, NA, NA, NA, NA, NA, 53.53230157, 56.50844779, NA, 95.3081965, NA, 37.30962014, NA, 89.65227764, 83.32614634, 86.86689295, 65.29408289, 52.60530236, 68.04817262, 49.62320119, 75.77836711, 57.43548019, 57.43548019, 48.11733014, 42.52828371, 50.35225142, 39.53670609, 53.56515511, 49.51437239, 44.25181635, 45.8886498, 39.03673535, 35.59385895, 41.1169574, 30.30335625, 39.365075, 35.21153654, 38.04655178, 30.14668171))
#上位200語ずつの場合の特徴語頻度表
term_200200 <- data.frame(TERM = c("い", "な", "て", "っ", "の", "る", "に", "か", "も", "た", "れ", "は", "!", "う", "し", "が", "ん", "こ", "だ", "ら", "と", "で", "け", "を", "a", "く", "ー", "ま", "り", "ゃ", "バ", "e", "き", "チ", "さ", "あ", "え", "ち", "l", "ッ", "つ", "よ", "(", ")", "そ", "じ", "み", "ン", "す", "ど", "来", "ャ", "や", "め", "イ", "人", "一", "ト", "わ", "h", "T", "ば", "出", "ろ", "?", "n", "ル", "何", "今", "時", "生", "ね", "ョ", "ず", "回", "S", "ィ", "せ", "O", "ぶ", "ラ", "ス", "ナ", "見", "げ", "サ", "私", "×", "g", "ブ", "心", "大", "B", "u", "カ", "ぐ", "気", "前", "s", "ひ", "お", "フ", "間", "春", "上", "…", "E", "ド", "ょ", "行", "自", "突", "分", "目", "立", "涙", "o", "y", "ア", "プ", "リ", "誰", "中", "日", "Y", "ギ", "コ", "向", "事", "世", "切", "不", "命", "「", "」", "D", "び", "ロ", "空", "笑", "真", "青", "全", "無", ".", "2", "I", "キ", "セ", "ぜ", "ぬ", "君", "度", "熱", "力", "、", "ジ", "ニ", "マ", "ゆ", "信", "続", "直", "敵", "道", "負", "夢", "H", "N", "ウ", "シ", "雨", "懸", "拳", "後", "思", "勝", "体", "未", "1", "c", "C", "r", "W", "タ", "テ", "ム", "む", "叫", "手", "進", "絶", "張", "年", "派", "明", "流", "3", "エ", "へ", "t", "言", "夜", "咲", "恋", "ク", "メ", "花", "帰", "J", "L", "デ", "愛", "会", "好", "子", "ぎ", "ほ", "願", "持", "界", "御", "女", "i", "ぁ", "ダ", "楽", "社", "杯", "A", "ズ", "づ", "レ", "感", "少", "致", "浮", "ご", "ハ", "ホ", "彼", "本", "d", "強", "情", "色", "想", "着", "飛", "方", "ふ", "べ", "泣", "純", "傷", "尽", "知"), 
                          FREQ.x = c(393, 374, 306, 237, 227, 219, 216, 185, 182, 175, 175, 164, 158, 146, 146, 143, 143, 141, 141, 138, 136, 129, 119, 113, 103, 92, 87, 86, 86, 79, 75, 73, 70, 70, 67, 63, 63, 61, 59, 57, 57, 57, 56, 56, 53, 51, 51, 50, 49, 49, 47, 45, 45, 44, 42, 42, 39, 37, 37, 36, 36, 34, 33, 32, 31, 31, 31, 31, 29, 28, 28, 27, 27, 26, 26, 25, 25, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(727.6745376, 690.9920123, 576.5707954, 419.7825924, 423.562852, 428.8700829, 411.660544, 339.5317148, 338.1273434, 333.7439907, 338.4626999, 314.5341785, 255.9005746, 271.0253703, 273.1255917, 271.7235119, 246.0539469, 246.9632657, 237.1258061, 249.6641673, 250.7956977, 224.0733653, 226.19518, 226.4636443, 193.7464854, 187.5505461, 160.2809411, 160.6885188, 158.8098862, 141.4903291, 161.1693703, 119.0379504, 138.6981468, 114.0384758, 122.1793524, 118.2781935, 125.7352916, 113.1653064, 114.7276252, 101.8826734, 117.9781282, 104.5763907, 92.86071893, 92.86071893, 102.9844117, 96.53973191, 99.34320887, 87.54881097, 100.9339805, 88.98135692, 79.12743283, 71.27035288, 77.59978007, 85.28838092, 90.43662217, 71.59751976, 72.40261105, 68.148304, 74.91351201, 65.4789059, 58.954233, 71.31782931, 53.86222166, 58.08224249, 58.41325732, 52.02600074, 62.42502266, 62.22936646, 50.14195053, 51.85096739, 58.14074433, 49.4422164, 46.59013625, 51.72511044, 57.23894735, 48.91947427, 42.2997285, 40.33799598, 39.40557098, 42.95740526, 43.62146719, 41.15721413, 57.55381584, 45.58799803, 39.10887374, 51.10256855, 41.97116128, 32.71677966, 34.97811628, 51.12209201, 42.74551725, 35.69509045, 32.88802204, 33.5081267, 32.56692798, 38.80977734, 33.92929232, 37.64030544, 30.37174304, 36.91866163, 35.17124973, 29.26829268, 29.2070625, 28.89332215, 36.30902057, 31.48001077, 27.20942434, 29.89536531, 29.63507009, 30.98903071, 35.23350821, 30.85836796, 35.07709338, 30.05135353, 31.38857152, 32.04007054, 25.78315529, 25.30295899, 32.79290975, 31.99693475, 29.10656338, 33.18610323, 29.68850407, 28.338832, 22.74781468, 38.57528357, 25.02189689, 25.21905962, 27.96276671, 25.09285605, 28.85070929, 30.83723019, 26.63097615, 26.42798253, 26.42798253, 21.40054876, 27.65823401, 23.65236133, 26.20664696, 24.59332185, 22.98289282, 23.22642077, 26.18334865, 25.34389794, 23.59554051, 20.75863317, 19.79094587, 22.79783758, 31.45874182, 21.7672411, 28.49427575, 21.63150352, 20.26931816, 20.73205753, 23.36888284, 21.39867445, 20.18762184, 18.92615707, 20.87915473, 22.97095614, 22.8181508, 19.95830157, 19.88540613, 34.74799574, 24.26058893, 23.26756503, 21.11719581, 17.74241662, 17.20718289, 17.99389177, 17.81292802, 19.46757166, 18.0370523, 21.77806826, 18.71800875, 20.41904207, 17.18991417, 21.3336497, 18.73630542, 15.15099008, 17.39189691, 17.12250832, 16.75039885, 15.85126293, 18.51894182, 18.53409239, 22.0144909, 20.26354731, 14.85456228, 18.49625214, 19.49906311, 20.91321372, 16.43034437, 15.84330783, 14.80942467, 17.92277547, 17.3208933, 13.4529148, 14.58685948, 18.0561854, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(525, 397, 307, 237, 279, 166, 200, 211, 158, 284, 141, 131, 78, 181, 243, 115, 150, 112, 139, 156, 188, 127, 117, 88, 31, 149, 75, 140, 113, 69, NA, 43, 122, 13, 57, 62, 76, 87, NA, 40, 57, 110, 84, 84, 64, 49, 53, 51, 81, 70, NA, NA, 75, 50, 46, 25, 27, 19, 43, 16, 21, 34, 41, 55, 46, 20, 30, 21, 43, 12, 14, 41, 13, 26, 9, NA, NA, 62, 14, 9, 73, 22, NA, 35, 9, 12, 34, NA, NA, NA, 18, 10, NA, 37, 31, 10, 56, NA, 28, 13, 31, NA, 18, 12, 14, 44, NA, 13, 18, 15, 18, NA, 21, 12, NA, 9, 20, 30, 14, 16, 19, 18, 18, 40, 9, NA, 9, NA, NA, 20, 9, 16, NA, 31, 31, NA, 25, 11, 17, 33, 12, NA, NA, 12, 9, NA, NA, 21, 14, NA, NA, 44, NA, 12, NA, 47, 11, NA, 13, 10, NA, NA, NA, NA, NA, NA, 10, NA, NA, 15, 25, NA, NA, NA, NA, 17, NA, NA, NA, NA, 13, NA, 36, 12, 16, NA, 11, 9, NA, 9, NA, NA, NA, NA, NA, 9, NA, NA, NA, 19, 39, 37, 30, 27, 26, 21, 21, 20, 20, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9), 
                          R_FREQ.y = c(996.7266096, 762.2023538, 591.1954213, 437.5380262, 536.8398087, 337.0049053, 385.1594682, 383.0992771, 294.0506284, 527.1088671, 260.1495909, 249.9262806, 107.2814669, 344.1765222, 442.7453531, 223.3775302, 293.6849509, 219.2374114, 274.0807757, 290.2692043, 363.0938111, 232.2384673, 232.3865119, 164.7457363, 49.61476049, 279.8723329, 137.8297712, 243.603045, 211.8492118, 110.290679, NA, 66.25204263, 220.231329, 21.38817475, 99.98104422, 116.336576, 147.5951279, 149.4541752, NA, 69.84779183, 105.9870654, 203.4486775, 138.1547282, 138.1547282, 127.1691369, 99.02104333, 109.9503964, 87.23540047, 142.1085004, 152.35272, NA, NA, 123.8192361, 90.29319389, 89.34048558, 50.09046122, 41.1881607, 40.47853289, 75.01121244, 27.27791026, 28.89940202, 59.20804398, 70.70177059, 87.72956774, 88.39993293, 31.90247055, 54.81879517, 33.44177935, 73.57396495, 19.70680306, 28.23678726, 82.53927874, 22.14856453, 46.67282583, 16.73268513, NA, NA, 106.8245668, 22.33684281, 13.86154919, 138.7491174, 49.98051328, NA, 61.78987069, 18.47293261, 17.69320471, 74.06740201, NA, NA, NA, 36.93369398, 19.23772754, NA, 53.53230157, 56.50844779, 17.69576844, 95.3081965, NA, 37.30962014, 22.39973072, 48.11733014, NA, 31.43143997, 25.97398706, 27.42808658, 83.32614634, NA, 25.86598964, 33.97302186, 28.27469531, 36.80087323, NA, 41.1169574, 17.50699066, NA, 19.33020786, 30.30335625, 42.52828371, 24.75061888, 33.36514322, 35.88857917, 32.18667783, 33.3769214, 65.29408289, 12.68406195, NA, 18.34452326, NA, NA, 38.04655178, 19.37271354, 27.17906441, NA, 57.43548019, 57.43548019, NA, 44.25181635, 22.79982695, 28.68254176, 75.77836711, 22.07915931, NA, NA, 22.00220013, 14.668969, NA, NA, 45.8886498, 17.52218787, NA, NA, 86.86689295, NA, 21.02437746, NA, 89.65227764, 22.47824471, NA, 27.84245052, 20.10594474, NA, NA, NA, NA, NA, NA, 17.77976648, NA, NA, 26.73676136, 49.51437239, NA, NA, NA, NA, 30.03601035, NA, NA, NA, NA, 20.36789664, NA, 49.62320119, 15.56420233, 32.76920521, NA, 23.36252633, 15.86648011, NA, 23.24425468, NA, NA, NA, NA, NA, 12.51967783, NA, NA, NA, 30.14668171, 52.60530236, 68.04817262, 50.35225142, 39.53670609, 53.56515511, 39.03673535, 35.59385895, 39.365075, 35.21153654, 23.3463035, 29.13920437, 30.51780623, 41.21203249, 31.84555849, 29.11519579, 26.86341784, 36.025014, 28.808304, 21.10646836, 27.47966736, 27.24360243, 16.73597927, 23.37112253, 18.0968826, 19.07913631, 22.48014632, 21.85136622, 14.50892857, 14.98521383, 20.0175329, 18.55183505, 22.13372861, 20.67162289, 25.96286253, 19.80437309, 13.39285714, 19.566396, 22.69302018, 21.44326646, 19.46363916, 27.0262534, 14.79984624, 17.6038969, 18.62540665, 19.07982673, 17.15487639, 21.30014836, 18.96311923, 17.02361874, 14.13219733, 19.93943516, 18.06218348, 17.13958809, 17.30532297, 18.60175505, 13.96041487, 17.2992565))


次は単語レベルで見ていく。単語は相対頻度とTF-IDFのパターンと品詞を名詞・動詞・形容詞・形容動詞に限定したパターンの3種類を使う。品詞を限定するのは、歌詞の内容面に注目するため。

グループごとの総単語数と異語数
> tokens_types_all2
##     TOKENS TYPES
## kbs   5845  1323
## tbk   6240  1435


曲ごとの総単語数と異語数

(クリックで展開)

> tokens_types
##             TOKENS TYPES
## kbs_a1a.txt    253   105
## kbs_a1e.txt    344   126
## kbs_a1g.txt    261    98
## kbs_a1i.txt    312   162
## kbs_a1l.txt    168    88
## kbs_a1o.txt    301   146
## kbs_a1q.txt    374   140
## kbs_s0a.txt    232    98
## kbs_s0b.txt    163    63
## kbs_s1a.txt    365   152
## kbs_s2a.txt    301   130
## kbs_s2b.txt    272   126
## kbs_s2c.txt    329   108
## kbs_s3a.txt    325   143
## kbs_s3b.txt    362   141
## kbs_s4a.txt    247    87
## kbs_s5a.txt    413   171
## kbs_s5b.txt    254   104
## kbs_s6a.txt    376   122
## kbs_s6b.txt    193   103
## tbk_a1b.txt    337   135
## tbk_a1f.txt    274   118
## tbk_a1i.txt    312   127
## tbk_a1m.txt    245   136
## tbk_a1q.txt    279   139
## tbk_a1r.txt    171   103
## tbk_s0a.txt    207   110
## tbk_s0b.txt    379   158
## tbk_s0c.txt    154    77
## tbk_s1a.txt    410   167
## tbk_s1b.txt    388   129
## tbk_s1c.txt    433   200
## tbk_s2a.txt    467   177
## tbk_s2b.txt    218    89
## tbk_s2c.txt    184    94
## tbk_s3a.txt    279   133
## tbk_s3b.txt    268   134
## tbk_s3c.txt    311   140
## tbk_s4a.txt    277   123
## tbk_s4b.txt    305   156
## tbk_s4c.txt    342   127


相対頻度による特徴語

単語でも同様に両グループから相対頻度上位50・100・200語を合わせて、それぞれ66・139・298語を特徴語となった。

> term_200200$TERM
  ##   [1] "て"             "に"             "の"             "は"             "だ"             "!"             "ない"           "が"            
 ##    [9] "を"             "も"             "た"             "で"             "la"             ")"              "("              "する"          
##   [17] "う"             "てる"           "いる"           "から"           "さ"             "?"             "たい"           "か"            
##   [25] "じゃ"           "ば"             "何"             "なる"           "と"             "な"             "来る"           "こと"          
##   [33] "ん"             "私"             "今"             "ぬ"             "回"             "よう"           "Sha"            "時"            
##   [41] "なんて"         "一"             "もう"           "だけ"           "前"             "…"             "やる"           "れる"          
##   [49] "よ"             "チョット"       "心"             "あなた"         "ね"             "敵"             "でも"           "ある"          
##   [57] "サバイバー"     "フィーバー"     "みる"           "誰"             "自分"           "って"           "」"             "いつか"        
##   [65] "2"              "Teenage"        "きっと"         "こぶし"         "「"             "です"           "この"           "人"            
##   [73] "どう"           "涙"             "拳"             "だって"         "ちゃう"         "君"             "それ"           "いま"          
##   [81] "Blues×"        "、"             "まで"           "わたし"         "ナセバナル"     "やれる"         "くる"           "ぜ"            
##   [89] "人生"           "いい"           "目"             "これ"           "みんな"         "続ける"         "でっかい"       "出す"          
##   [97] "信じる"         "絶対"           "けれど"         "はず"           "まだ"           "にゃ"           "良い"           "わかる"        
## [105] "困難"           "へ"             "なんか"         "人間"           "不安"           "夢"             "出来る"         "行く"          
## [113] "こんな"         "見える"         "たって"         "3"              "バッチ"         "そう"           "もの"           "孤独"          
## [121] "たり"           "負け"           "青春"           "春"             "ちゃ"           "いく"           "どんな"         "ゆく"          
## [129] "忘れる"         "生き残る"       "選ぶ"           "未来"           "かも"           "年"             "明日"           "夜"            
## [137] "者"             "お"             "道"             "見る"           "く"             "懸命"           "輝く"           "空"            
## [145] "度"             "Bang"           "Chance"         "これから"       "サバイバル"     "熱い"           "カーニバル"     "チャンス"      
## [153] "のに"           "嗚呼"           "待つ"           "そんな"         "どこ"           "風"             "降る"           "という"        
## [161] "'"              "進む"           "無駄"           "冬"             "歩く"           "咲く"           "叫ぶ"           "一緒"          
## [169] "回る"           "愛"             "胸"             "必ず"           "ところ"         "また"           "雨"             "かける"        
## [177] "逃げる"         "花"             "ます"           "失敗"           "Ya"             "もん"           "ゲーム"         "経つ"          
## [185] "知れる"         "得体"           "ため"           "流す"           "ここ"           "くれる"         "入れる"         "精一杯"        
## [193] "朝"             "念"             "切る"           "変える"         "名"             "知る"           "向かう"         "もっと"        
## [201] "笑う"           "けど"           "言う"           "今夜"           "帰る"           "恋"             "Just"           "や"            
## [209] "いや"           "クラッカー"     "気"             "好き"           "日"             "気持ち"         "会う"           "られる"        
## [217] "とき"           "まま"           "世界"           "咲かせる"       "なれる"         "子"             "し"             "ねえ"          
## [225] "感じる"         "強い"           "Try"            "ねぇ"           "浮かれる"       "La"             "とか"           "泣く"          
## [233] "気づく"         "鮮やか"         "みたい"         "すぎる"         "きみ"           "街"             "ずっと"         "彼"            
## [241] "お願い"         "よろしく"       "御社"           "致す"           "いつも"         "純情"           "より"           "Ah"            
## [249] "今日"           "笑い"           "疲れる"         "出る"           "少し"           "あの"           "舞う"           "スキ"          
## [257] "ダメ"           "らしい"         "悪い"           "ちい"           "火傷"           "低温"           "熱る"           "勇気"          
## [265] "イヤ"           "ひとり"         "ほしい"         "ちょっと"       "次"             "その"           "き"             "ら"            
## [273] "なぁ"           "着替える"       "いちど"         "産声"           "そっと"         "やかん"         "想い"           "なか"          
## [281] "思う"           "恥ずかしい"     "。"             "だけど"         "始まる"         "たび"           "見つめる"       "しまう"        
## [289] "でる"           "さぁ"           "プラネタリウム" "雪"             "*"             "ラブ"           "模様"           "恋心"          
## [297] "赤い"           "瞬間"    


データフレーム作成コード(クリックで展開)

#上位50語ずつの場合の特徴語頻度表
term_5050 <- data.frame(TERM = c("て", "に", "の", "は", "だ", "!", "ない", "が", "を", "も", "た", "で", "la", ")", "(", "する", "う", "てる", "いる", "から", "さ", "?", "たい", "か", "じゃ", "ば", "何", "なる", "と", "な", "来る", "こと", "ん", "私", "今", "ぬ", "回", "よう", "Sha", "時", "なんて", "一", "もう", "だけ", "前", "…", "やる", "れる", "よ", "チョット", "君", "、", "笑う", "そう", "「", "」", "ます", "けど", "言う", "この", "って", "あなた", "今夜", "花", "誰", "ちゃう"), 
                        FREQ.x = c(186, 189, 178, 151, 163, 158, 138, 116, 113, 106, 73, 71, 43, 54, 52, 43, 46, 39, 39, 40, 38, 31, 34, 27, 29, 25, 27, 29, 25, 24, 26, 28, 25, 22, 24, 19, 20, 22, 17, 21, 18, 17, 17, 18, 17, 17, 20, 16, 19, 16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                        R_FREQ.x = c(651.1302329, 648.5350727, 631.2721467, 531.7899037, 506.2891879, 489.9926784, 482.5602241, 411.1183309, 410.7166121, 335.3770379, 246.4295924, 211.5330709, 174.0890688, 166.843447, 162.830269, 154.2368357, 146.419353, 134.3334959, 133.4116762, 130.9158246, 127.2014335, 107.8706614, 104.6649686, 102.743471, 98.7997942, 98.48468165, 93.88215027, 89.53944471, 87.81026458, 83.16791798, 82.73551228, 82.29646116, 80.62616087, 76.55029639, 74.09802141, 73.9876383, 73.84187653, 71.17975266, 68.82591093, 68.46128846, 65.91712686, 65.88649938, 65.47590139, 64.67603488, 64.44681866, 63.41038693, 62.4744145, 61.12735454, 60.57756147, 58.82352941, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                        FREQ.y = c(210, 185, 221, 119, 163, 78, 128, 101, 88, 85, 115, 78, NA, 69, 81, 71, 61, 42, 34, 19, NA, 40, 68, 34, 20, 21, NA, 35, 62, 27, NA, 28, 33, 34, NA, NA, NA, NA, NA, NA, NA, NA, NA, 43, NA, 44, NA, 20, 58, NA, 44, 46, 24, 27, 30, 29, 35, 20, 27, 23, 20, 20, 22, 17, 18, 22), 
                        R_FREQ.y = c(735.3189572, 654.8073453, 784.1555936, 412.8586758, 575.4673778, 203.3765716, 450.1869182, 352.9138901, 299.3765078, 284.9416882, 395.1515455, 251.1571023, NA, 189.2322916, 217.395012, 248.8508277, 200.5006837, 150.5570437, 120.4080769, 74.47938863, NA, 140.7530373, 228.8212781, 120.4003694, 65.64348884, 67.82108186, NA, 119.6099521, 204.0951094, 93.52264227, NA, 116.5464561, 131.4890753, 130.3383043, NA, NA, NA, NA, NA, NA, NA, NA, NA, 143.2736358, NA, 150.0656594, NA, 69.49845852, 200.8445601, NA, 161.4996045, 156.9819531, 105.998778, 102.089977, 99.06904423, 94.94764489, 87.23889026, 86.92898666, 84.54133059, 81.44611391, 69.10230169, 64.73133729, 64.32748538, 59.37275898, 58.63298502, 57.41786763))
#上位100語ずつの場合の特徴語頻度表
term_100100 <- data.frame(TERM = c("て", "に", "の", "は", "だ", "!", "ない", "が", "を", "も", "た", "で", "la", ")", "(", "する", "う", "てる", "いる", "から", "さ", "?", "たい", "か", "じゃ", "ば", "何", "なる", "と", "な", "来る", "こと", "ん", "私", "今", "ぬ", "回", "よう", "Sha", "時", "なんて", "一", "もう", "だけ", "前", "…", "やる", "れる", "よ", "チョット", "心", "あなた", "ね", "敵", "でも", "ある", "サバイバー", "フィーバー", "みる", "誰", "自分", "って", "」", "いつか", "2", "Teenage", "きっと", "こぶし", "「", "です", "この", "人", "どう", "涙", "拳", "だって", "ちゃう", "君", "それ", "いま", "Blues×", "、", "まで", "わたし", "ナセバナル", "やれる", "くる", "ぜ", "人生", "いい", "目", "これ", "みんな", "続ける", "でっかい", "出す", "信じる", "絶対", "けれど", "はず", "笑う", "そう", "ます", "けど", "言う", "今夜", "花", "帰る", "恋", "へ", "Just", "や", "いや", "どこ", "クラッカー", "く", "気", "好き", "そんな", "日", "気持ち", "会う", "られる", "とき", "まま", "世界", "咲かせる", "なれる", "子", "し", "見る", "夢", "ねえ", "感じる", "強い", "Try", "ねぇ", "もっと", "浮かれる"), 
                          FREQ.x = c(186, 189, 178, 151, 163, 158, 138, 116, 113, 106, 73, 71, 43, 54, 52, 43, 46, 39, 39, 40, 38, 31, 34, 27, 29, 25, 27, 29, 25, 24, 26, 28, 25, 22, 24, 19, 20, 22, 17, 21, 18, 17, 17, 18, 17, 17, 20, 16, 19, 16, 16, 12, 18, 10, 17, 18, 9, 16, 13, 13, 12, 15, 13, 12, 12, 11, 13, 12, 12, 11, 12, 12, 11, 10, 11, 12, 12, 13, 10, 11, 10, 12, 10, 7, 7, 10, 7, 11, 10, 9, 10, 8, 11, 11, 10, 10, 9, 9, 11, 8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(651.1302329, 648.5350727, 631.2721467, 531.7899037, 506.2891879, 489.9926784, 482.5602241, 411.1183309, 410.7166121, 335.3770379, 246.4295924, 211.5330709, 174.0890688, 166.843447, 162.830269, 154.2368357, 146.419353, 134.3334959, 133.4116762, 130.9158246, 127.2014335, 107.8706614, 104.6649686, 102.743471, 98.7997942, 98.48468165, 93.88215027, 89.53944471, 87.81026458, 83.16791798, 82.73551228, 82.29646116, 80.62616087, 76.55029639, 74.09802141, 73.9876383, 73.84187653, 71.17975266, 68.82591093, 68.46128846, 65.91712686, 65.88649938, 65.47590139, 64.67603488, 64.44681866, 63.41038693, 62.4744145, 61.12735454, 60.57756147, 58.82352941, 57.52261177, 57.03476697, 56.49038333, 56.12265391, 55.79275352, 55.76054316, 55.21472393, 53.15614618, 52.75467649, 48.66755616, 47.00972276, 46.06044529, 45.65625804, 43.80683066, 43.63332518, 42.14559387, 42.08164598, 41.83869438, 41.82484041, 41.55975807, 41.46877079, 40.80200316, 40.55698137, 40.44688561, 40.3482458, 40.25679996, 39.03113247, 38.7272147, 38.54131839, 38.5319978, 38.31417625, 37.17297102, 37.10863531, 36.90408251, 36.26943005, 36.0838358, 35.66723132, 35.60682333, 35.05728299, 34.41704321, 34.3427875, 34.24012742, 33.23623416, 33.23609045, 33.18135765, 33.13551977, 33.08492701, 32.98820115, 32.85203488, 32.53885068, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(210, 185, 221, 119, 163, 78, 128, 101, 88, 85, 115, 78, NA, 69, 81, 71, 61, 42, 34, 19, 14, 40, 68, 34, 20, 21, 11, 35, 62, 27, NA, 28, 33, 34, 12, NA, NA, 13, NA, NA, 9, NA, 15, 43, NA, 44, NA, 20, 58, NA, 12, 20, 16, NA, 12, 15, NA, NA, 8, 18, 15, 20, 29, NA, NA, NA, NA, NA, 30, 8, 23, 13, NA, 9, NA, NA, 22, 44, NA, NA, NA, 46, NA, NA, NA, NA, NA, NA, NA, 15, NA, NA, 7, NA, NA, 11, NA, NA, NA, NA, 24, 27, 35, 20, 27, 22, 17, 17, 16, 18, 18, 13, 15, 12, 12, 12, 12, 13, 8, 10, 12, 11, 11, 11, 11, 11, 15, 10, 11, 8, 11, 10, 10, 7, 9, 12, 8, 8, 10), 
                          R_FREQ.y = c(735.3189572, 654.8073453, 784.1555936, 412.8586758, 575.4673778, 203.3765716, 450.1869182, 352.9138901, 299.3765078, 284.9416882, 395.1515455, 251.1571023, NA, 189.2322916, 217.395012, 248.8508277, 200.5006837, 150.5570437, 120.4080769, 74.47938863, 50.7169691, 140.7530373, 228.8212781, 120.4003694, 65.64348884, 67.82108186, 32.13015821, 119.6099521, 204.0951094, 93.52264227, NA, 116.5464561, 131.4890753, 130.3383043, 44.17200056, NA, NA, 44.64910686, NA, NA, 35.18829417, NA, 53.70098937, 143.2736358, NA, 150.0656594, NA, 69.49845852, 200.8445601, NA, 42.47620091, 64.73133729, 55.93237986, NA, 43.98876445, 54.8089542, NA, NA, 32.23350059, 58.63298502, 56.12848849, 69.10230169, 94.94764489, NA, NA, NA, NA, NA, 99.06904423, 29.12172952, 81.44611391, 52.01021712, NA, 36.4298325, NA, NA, 57.41786763, 161.4996045, NA, NA, NA, 156.9819531, NA, NA, NA, NA, NA, NA, NA, 56.94117965, NA, NA, 32.55584126, NA, NA, 36.4877487, NA, NA, NA, NA, 105.998778, 102.089977, 87.23889026, 86.92898666, 84.54133059, 64.32748538, 59.37275898, 54.45368489, 52.92573451, 51.39956006, 46.39175258, 45.23792308, 43.59220087, 43.22698999, 43.01075269, 40.67089336, 40.31042723, 40.1468857, 38.88249867, 38.0004921, 37.68955123, 37.68629085, 37.362251, 37.16504222, 36.7177993, 36.69774141, 35.69847671, 34.22312614, 33.96159145, 33.34596959, 33.19930445, 32.39432118, 31.98429286, 31.28689182, 31.04145666, 30.92783505, 30.41518748, 30.02352313, 29.23976608))
#上位200語ずつの場合の特徴語頻度表
term_200200 <- data.frame(TERM = c("て", "に", "の", "は", "だ", "!", "ない", "が", "を", "も", "た", "で", "la", ")", "(", "する", "う", "てる", "いる", "から", "さ", "?", "たい", "か", "じゃ", "ば", "何", "なる", "と", "な", "来る", "こと", "ん", "私", "今", "ぬ", "回", "よう", "Sha", "時", "なんて", "一", "もう", "だけ", "前", "…", "やる", "れる", "よ", "チョット", "心", "あなた", "ね", "敵", "でも", "ある", "サバイバー", "フィーバー", "みる", "誰", "自分", "って", "」", "いつか", "2", "Teenage", "きっと", "こぶし", "「", "です", "この", "人", "どう", "涙", "拳", "だって", "ちゃう", "君", "それ", "いま", "Blues×", "、", "まで", "わたし", "ナセバナル", "やれる", "くる", "ぜ", "人生", "いい", "目", "これ", "みんな", "続ける", "でっかい", "出す", "信じる", "絶対", "けれど", "はず", "まだ", "にゃ", "良い", "わかる", "困難", "へ", "なんか", "人間", "不安", "夢", "出来る", "行く", "こんな", "見える", "たって", "3", "バッチ", "そう", "もの", "孤独", "たり", "負け", "青春", "春", "ちゃ", "いく", "どんな", "ゆく", "忘れる", "生き残る", "選ぶ", "未来", "かも", "年", "明日", "夜", "者", "お", "道", "見る", "く", "懸命", "輝く", "空", "度", "Bang", "Chance", "これから", "サバイバル", "熱い", "カーニバル", "チャンス", "のに", "嗚呼", "待つ", "そんな", "どこ", "風", "降る", "という", "'", "進む", "無駄", "冬", "歩く", "咲く", "叫ぶ", "一緒", "回る", "愛", "胸", "必ず", "ところ", "また", "雨", "かける", "逃げる", "花", "ます", "失敗", "Ya", "もん", "ゲーム", "経つ", "知れる", "得体", "ため", "流す", "ここ", "くれる", "入れる", "精一杯", "朝", "念", "切る", "変える", "名", "知る", "向かう", "もっと", "笑う", "けど", "言う", "今夜", "帰る", "恋", "Just", "や", "いや", "クラッカー", "気", "好き", "日", "気持ち", "会う", "られる", "とき", "まま", "世界", "咲かせる", "なれる", "子", "し", "ねえ", "感じる", "強い", "Try", "ねぇ", "浮かれる", "La", "とか", "泣く", "気づく", "鮮やか", "みたい", "すぎる", "きみ", "街", "ずっと", "彼", "お願い", "よろしく", "御社", "致す", "いつも", "純情", "より", "Ah", "今日", "笑い", "疲れる", "出る", "少し", "あの", "舞う", "スキ", "ダメ", "らしい", "悪い", "ちい", "火傷", "低温", "熱る", "勇気", "イヤ", "ひとり", "ほしい", "ちょっと", "次", "その", "き", "ら", "なぁ", "着替える", "いちど", "産声", "そっと", "やかん", "想い", "なか", "思う", "恥ずかしい", "。", "だけど", "始まる", "たび", "見つめる", "しまう", "でる", "さぁ", "プラネタリウム", "雪", "*", "ラブ", "模様", "恋心", "赤い", "瞬間"), 
                          FREQ.x = c(186, 189, 178, 151, 163, 158, 138, 116, 113, 106, 73, 71, 43, 54, 52, 43, 46, 39, 39, 40, 38, 31, 34, 27, 29, 25, 27, 29, 25, 24, 26, 28, 25, 22, 24, 19, 20, 22, 17, 21, 18, 17, 17, 18, 17, 17, 20, 16, 19, 16, 16, 12, 18, 10, 17, 18, 9, 16, 13, 13, 12, 15, 13, 12, 12, 11, 13, 12, 12, 11, 12, 12, 11, 10, 11, 12, 12, 13, 10, 11, 10, 12, 10, 7, 7, 10, 7, 11, 10, 9, 10, 8, 11, 11, 10, 10, 9, 9, 11, 8, 10, 10, 7, 10, 9, 8, 10, 11, 7, 9, 10, 8, 8, 7, 8, 9, 10, 7, 9, 7, 7, 6, 9, 9, 8, 8, 8, 9, 8, 4, 4, 8, 7, 8, 7, 7, 5, 5, 6, 7, 8, 6, 7, 7, 8, 6, 6, 9, 4, 7, 7, 6, 6, 6, 6, 7, 5, 6, 7, 5, 7, 5, 5, 7, 5, 7, 7, 7, 5, 6, 6, 5, 6, 6, 6, 5, 4, 7, 6, 6, 6, 6, 3, 6, 3, 3, 3, 6, 6, 5, 5, 6, 5, 4, 4, 6, 4, 5, 6, 5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(651.1302329, 648.5350727, 631.2721467, 531.7899037, 506.2891879, 489.9926784, 482.5602241, 411.1183309, 410.7166121, 335.3770379, 246.4295924, 211.5330709, 174.0890688, 166.843447, 162.830269, 154.2368357, 146.419353, 134.3334959, 133.4116762, 130.9158246, 127.2014335, 107.8706614, 104.6649686, 102.743471, 98.7997942, 98.48468165, 93.88215027, 89.53944471, 87.81026458, 83.16791798, 82.73551228, 82.29646116, 80.62616087, 76.55029639, 74.09802141, 73.9876383, 73.84187653, 71.17975266, 68.82591093, 68.46128846, 65.91712686, 65.88649938, 65.47590139, 64.67603488, 64.44681866, 63.41038693, 62.4744145, 61.12735454, 60.57756147, 58.82352941, 57.52261177, 57.03476697, 56.49038333, 56.12265391, 55.79275352, 55.76054316, 55.21472393, 53.15614618, 52.75467649, 48.66755616, 47.00972276, 46.06044529, 45.65625804, 43.80683066, 43.63332518, 42.14559387, 42.08164598, 41.83869438, 41.82484041, 41.55975807, 41.46877079, 40.80200316, 40.55698137, 40.44688561, 40.3482458, 40.25679996, 39.03113247, 38.7272147, 38.54131839, 38.5319978, 38.31417625, 37.17297102, 37.10863531, 36.90408251, 36.26943005, 36.0838358, 35.66723132, 35.60682333, 35.05728299, 34.41704321, 34.3427875, 34.24012742, 33.23623416, 33.23609045, 33.18135765, 33.13551977, 33.08492701, 32.98820115, 32.85203488, 32.53885068, 32.48301252, 32.12570749, 31.98137865, 31.90379062, 31.20768383, 30.86209877, 30.66402888, 30.50736498, 30.36065827, 30.16532427, 30.14732545, 29.51463309, 29.28344635, 29.04075927, 27.90608621, 27.69230769, 27.62430939, 27.34240946, 27.20032722, 26.8452444, 26.34152739, 26.20104301, 26.14803051, 26.09819231, 25.62698783, 25.28551232, 25.08337359, 24.82240363, 24.78726291, 24.5398773, 24.5398773, 24.16015197, 23.80690646, 23.5573423, 23.51994412, 22.96254196, 22.94513896, 22.59092751, 22.54835524, 22.52649035, 22.49656493, 22.47934726, 22.46923183, 22.44601098, 22.25515293, 22.05882353, 22.05882353, 21.79176755, 21.72716711, 21.68823215, 21.53846154, 21.334743, 21.17602257, 21.11613876, 21.04113978, 20.93018002, 20.84032765, 20.80640236, 20.74870099, 20.68368316, 20.60507528, 20.43942667, 20.33660053, 20.17129246, 20.03950452, 20.01350223, 19.94239061, 19.8273259, 19.76284585, 19.70568727, 19.32305691, 19.28130186, 19.2810176, 19.23056163, 19.0560701, 18.81174154, 18.79656335, 18.71657754, 18.63446631, 18.53058425, 18.46153846, 18.30917746, 18.2223196, 17.92025222, 17.85714286, 17.85714286, 17.4512858, 17.44186047, 17.37147362, 17.35490843, 17.34161989, 17.33970544, 17.29603046, 17.24137931, 17.22391084, 17.0414298, 17.00721257, 16.80925413, 16.71732523, 16.70032252, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(210, 185, 221, 119, 163, 78, 128, 101, 88, 85, 115, 78, NA, 69, 81, 71, 61, 42, 34, 19, 14, 40, 68, 34, 20, 21, 11, 35, 62, 27, NA, 28, 33, 34, 12, 7, NA, 13, NA, NA, 9, 9, 15, 43, NA, 44, NA, 20, 58, NA, 12, 20, 16, NA, 12, 15, NA, NA, 8, 18, 15, 20, 29, 4, NA, NA, 7, NA, 30, 8, 23, 13, NA, 9, NA, NA, 22, 44, NA, NA, NA, 46, NA, 6, NA, NA, NA, NA, NA, 15, 10, NA, 7, NA, NA, 11, 4, NA, 8, 7, NA, NA, 6, NA, NA, 18, 8, NA, 6, 10, NA, 7, 7, NA, NA, NA, NA, 27, 7, NA, 5, NA, 4, NA, 5, 7, 6, NA, 6, NA, NA, NA, NA, NA, 9, NA, NA, 6, NA, 11, 12, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8, 12, 5, NA, NA, NA, NA, 5, NA, NA, 7, NA, NA, NA, 6, NA, NA, NA, 8, NA, NA, NA, 17, 35, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, 7, NA, 8, 24, 20, 27, 22, 17, 16, 18, 13, 15, 12, 12, 13, 10, 12, 11, 11, 11, 11, 11, 15, 10, 11, 8, 10, 7, 9, 12, 8, 10, 9, 7, 8, 7, 8, 8, 7, 7, 8, 7, 4, 12, 12, 12, 12, 7, 7, 7, 7, 8, 5, 5, 6, 7, 6, 6, 4, 7, 6, 4, 6, 6, 6, 6, 7, 7, 6, 6, 6, 6, 7, 7, 9, 6, 5, 4, 4, 5, 6, 5, 4, 5, 5, 5, 3, 6, 5, 6, 5, 5, 7, 4, 4, 3, 3, 3, 3, 5, 6), 
                          R_FREQ.y = c(735.3189572, 654.8073453, 784.1555936, 412.8586758, 575.4673778, 203.3765716, 450.1869182, 352.9138901, 299.3765078, 284.9416882, 395.1515455, 251.1571023, NA, 189.2322916, 217.395012, 248.8508277, 200.5006837, 150.5570437, 120.4080769, 74.47938863, 50.7169691, 140.7530373, 228.8212781, 120.4003694, 65.64348884, 67.82108186, 32.13015821, 119.6099521, 204.0951094, 93.52264227, NA, 116.5464561, 131.4890753, 130.3383043, 44.17200056, 18.90248485, NA, 44.64910686, NA, NA, 35.18829417, 25.47708604, 53.70098937, 143.2736358, NA, 150.0656594, NA, 69.49845852, 200.8445601, NA, 42.47620091, 64.73133729, 55.93237986, NA, 43.98876445, 54.8089542, NA, NA, 32.23350059, 58.63298502, 56.12848849, 69.10230169, 94.94764489, 16.96110582, NA, NA, 23.11150743, NA, 99.06904423, 29.12172952, 81.44611391, 52.01021712, NA, 36.4298325, NA, NA, 57.41786763, 161.4996045, NA, NA, NA, 156.9819531, NA, 18.24749487, NA, NA, NA, NA, NA, 56.94117965, 27.80169584, NA, 32.55584126, NA, NA, 36.4877487, 17.04281407, NA, 26.73730561, 21.46973234, NA, NA, 19.53583066, NA, NA, 51.39956006, 26.89459979, NA, 19.19275145, 32.39432118, NA, 22.61012688, 17.36172549, NA, NA, NA, NA, 102.089977, 18.4641117, NA, 16.96272597, NA, 18.07698301, NA, 16.12261757, 24.2774844, 23.34203442, NA, 26.64254061, NA, NA, NA, NA, NA, 24.72134527, NA, NA, 18.18296011, NA, 33.19930445, 40.67089336, NA, NA, 29.10494916, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.88249867, 43.22698999, 21.43580399, NA, NA, NA, NA, 17.06534246, NA, NA, 26.34968223, NA, NA, NA, 23.98459302, NA, NA, NA, 25.1463324, NA, NA, NA, 59.37275898, 87.23889026, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16.39999032, NA, NA, NA, NA, NA, NA, NA, NA, 27.00644506, NA, 30.02352313, 105.998778, 86.92898666, 84.54133059, 64.32748538, 54.45368489, 52.92573451, 46.39175258, 45.23792308, 43.59220087, 43.01075269, 40.31042723, 40.1468857, 38.0004921, 37.68955123, 37.68629085, 37.362251, 37.16504222, 36.7177993, 36.69774141, 35.69847671, 34.22312614, 33.96159145, 33.34596959, 31.98429286, 31.28689182, 31.04145666, 30.92783505, 30.41518748, 29.23976608, 28.84615385, 28.52766596, 27.80474585, 26.70602458, 26.53053902, 26.43713011, 26.26251667, 26.11940299, 26.11795653, 25.99337782, 25.97402597, 25.69593148, 25.69593148, 25.69593148, 25.69593148, 25.59759073, 25.16459663, 24.85457548, 23.81792519, 23.02755343, 22.93577982, 22.93577982, 22.6312791, 22.57726784, 22.34422241, 22.00277961, 21.73913043, 21.68083362, 21.64913544, 21.6218471, 21.50537634, 21.50537634, 21.50537634, 21.50537634, 21.36024477, 20.77151335, 20.67335689, 20.5395827, 20.1769138, 20.04072943, 20.03148582, 19.73202542, 19.60823101, 19.23076923, 18.41424646, 18.34862385, 18.34862385, 17.98036037, 17.8041543, 17.77685843, 17.77144215, 17.72534445, 17.6558673, 17.62489802, 17.57416895, 17.33195803, 17.27818068, 17.15687016, 17.06696233, 17.02196002, 16.48087777, 16.32653061, 16.32653061, 16.30434783, 16.30434783, 16.30434783, 16.30434783, 16.14267684, 16.14224706))


相対頻度による特徴語(名詞,動詞,形容詞,形容動詞)

それぞれ76・156・324種類となった。

えーと、ちょっと記号とか混じってますね、あと数字と英単語も品詞違いがー。作業の都合上今回は無視します…。特徴語に"(",")"が入っているのは、ルビ(変わった読みや難しい漢字)が多いという歌詞の特徴が表れているという解釈もできるなー、などと思っていたのですが品詞縛りにも入ってこられるのはミスです。

データフレーム作成コード(クリックで展開)

#上位50語ずつの場合の特徴語頻度表
term_5050 <- data.frame(TERM = c("する", ")", "(", "la", "てる", "いる", "何", "ない", "こと", "なる", "来る", "回", "私", "今", "よう", "ん", "の", "時", "一", "前", "やる", "れる", "あなた", "Sha", "さ", "心", "フィーバー", "ある", "敵", "サバイバー", "みる", "自分", "誰", "いつか", "人", "いま", "こぶし", "ちゃう", "拳", "君", "涙", "それ", "2", "いい", "人生", "ナセバナル", "Teenage", "わたし", "くる", "信じる", "笑う", "言う", "今夜", "そう", "花", "帰る", "恋", "Just", "どこ", "気", "好き", "く", "日", "出す", "会う", "咲かせる", "られる", "まま", "気持ち", "みんな", "とき", "世界", "見る", "クラッカー", "なれる", "感じる"), 
                        FREQ.x = c(43, 54, 52, 43, 39, 39, 27, 27, 28, 29, 26, 20, 22, 24, 22, 23, 21, 21, 17, 17, 20, 16, 12, 17, 17, 16, 16, 18, 10, 9, 13, 12, 13, 12, 12, 11, 12, 12, 11, 13, 10, 10, 12, 9, 10, 7, 11, 7, 7, 9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                        R_FREQ.x = c(286.8502388, 278.1175479, 272.1868036, 267.0807453, 248.8154064, 237.2719217, 175.7200455, 165.827326, 155.6844354, 155.1713871, 139.7969185, 138.0054368, 134.2447417, 131.6750541, 131.5877994, 130.7100548, 130.4403782, 126.0016885, 119.5592717, 113.0738425, 112.6196581, 109.9269693, 106.7849608, 105.5900621, 103.6360804, 102.0969862, 101.2658228, 101.2610762, 94.57971733, 93.75, 88.07031291, 86.76584521, 86.5773998, 79.35358327, 74.17574306, 73.72076747, 72.64248795, 71.98071162, 70.64033802, 69.2934589, 69.25306715, 67.61558676, 67.06272747, 65.47130657, 64.61090802, 64.22018349, 63.58381503, 62.56488087, 61.89781022, 61.08388845, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                        FREQ.y = c(71, 69, 81, NA, 42, 34, NA, 28, 28, 35, NA, NA, 34, 11, 13, 27, 26, NA, NA, NA, NA, 20, 20, NA, NA, 12, NA, 15, NA, NA, NA, 15, 18, NA, 13, NA, NA, 22, NA, 44, 9, NA, NA, 15, NA, NA, NA, NA, NA, NA, 24, 27, 22, 20, 17, 17, 16, 18, 12, 12, 13, 12, 10, 11, 11, 15, 11, 11, 12, 7, 11, 11, 11, 12, 10, 7), 
                        R_FREQ.y = c(450.4732116, 320.5686966, 365.1105077, NA, 278.2901224, 217.3801094, NA, 217.5368709, 221.5801028, 221.1469481, NA, NA, 239.1907659, 75.73522471, 84.05313922, 199.8368313, 190.8812497, NA, NA, NA, NA, 128.6632887, 123.5870013, NA, NA, 74.61121311, NA, 101.0765333, NA, NA, NA, 107.7763935, 109.0003731, NA, 94.56657072, NA, NA, 104.0912485, NA, 283.9653792, 64.47854435, NA, NA, 105.2166071, NA, NA, NA, NA, NA, NA, 191.9663602, 159.3283284, 136.6459627, 135.5537168, 104.4729239, 100.8104111, 96.32585743, 82.56880734, 82.44430999, 75.46159758, 73.48272526, 72.68314093, 70.91505846, 69.55022779, 69.3877551, 67.36005709, 67.08615108, 66.8157752, 66.67503566, 65.88643905, 65.61097561, 64.84939186, 63.65550288, 63.15789474, 62.7942173, 62.13268887))
#上位100語ずつの場合の特徴語頻度表
term_100100 <- data.frame(TERM = c("する", ")", "(", "la", "てる", "いる", "何", "ない", "こと", "なる", "来る", "回", "私", "今", "よう", "ん", "の", "時", "一", "前", "やる", "れる", "あなた", "Sha", "さ", "心", "フィーバー", "ある", "敵", "サバイバー", "みる", "自分", "誰", "いつか", "人", "いま", "こぶし", "ちゃう", "拳", "君", "涙", "それ", "2", "いい", "人生", "ナセバナル", "Teenage", "わたし", "くる", "信じる", "やれる", "絶対", "わかる", "みんな", "Blues×", "続ける", "目", "でっかい", "これ", "不安", "出来る", "良い", "はず", "夢", "見える", "人間", "出す", "困難", "もの", "忘れる", "Bang", "Chance", "行く", "未来", "春", "ゆく", "孤独", "見る", "明日", "輝く", "負け", "いく", "バッチ", "3", "者", "夜", "く", "生き残る", "選ぶ", "青春", "度", "チャンス", "待つ", "道", "空", "進む", "風", "年", "歩く", "どこ", "笑う", "言う", "今夜", "そう", "花", "帰る", "恋", "Just", "気", "好き", "日", "会う", "咲かせる", "られる", "まま", "気持ち", "とき", "世界", "クラッカー", "なれる", "感じる", "浮かれる", "子", "強い", "Try", "彼", "泣く", "La", "気づく", "お願い", "御社", "致す", "すぎる", "みたい", "街", "知る", "きみ", "悪い", "咲く", "純情", "愛", "Ah", "鮮やか", "今日", "ダメ", "出る", "笑い", "疲れる", "勇気", "スキ", "次", "ほしい", "火傷", "低温", "ら", "想い"), 
                          FREQ.x = c(43, 54, 52, 43, 39, 39, 27, 27, 28, 29, 26, 20, 22, 24, 22, 23, 21, 21, 17, 17, 20, 16, 12, 17, 17, 16, 16, 18, 10, 9, 13, 12, 13, 12, 12, 11, 12, 12, 11, 13, 10, 10, 12, 9, 10, 7, 11, 7, 7, 9, 10, 9, 10, 11, 10, 11, 10, 10, 8, 7, 10, 7, 8, 9, 7, 11, 10, 9, 9, 8, 6, 6, 8, 8, 9, 9, 7, 7, 7, 7, 6, 8, 10, 9, 5, 7, 8, 4, 4, 9, 8, 6, 6, 6, 7, 5, 6, 8, 5, 5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(286.8502388, 278.1175479, 272.1868036, 267.0807453, 248.8154064, 237.2719217, 175.7200455, 165.827326, 155.6844354, 155.1713871, 139.7969185, 138.0054368, 134.2447417, 131.6750541, 131.5877994, 130.7100548, 130.4403782, 126.0016885, 119.5592717, 113.0738425, 112.6196581, 109.9269693, 106.7849608, 105.5900621, 103.6360804, 102.0969862, 101.2658228, 101.2610762, 94.57971733, 93.75, 88.07031291, 86.76584521, 86.5773998, 79.35358327, 74.17574306, 73.72076747, 72.64248795, 71.98071162, 70.64033802, 69.2934589, 69.25306715, 67.61558676, 67.06272747, 65.47130657, 64.61090802, 64.22018349, 63.58381503, 62.56488087, 61.89781022, 61.08388845, 61.03911151, 60.85099201, 59.98710821, 58.31076377, 57.80346821, 57.56801129, 57.53490515, 57.03703704, 56.61294846, 55.63527181, 54.62484403, 54.43114225, 53.65833874, 52.99758319, 51.68200671, 51.67118338, 51.6666561, 48.50696572, 48.45874521, 47.90521601, 47.61904762, 47.61904762, 47.61442995, 47.36627378, 47.33556551, 46.8660077, 46.81453842, 46.33154989, 45.30614426, 45.08659177, 44.72222222, 44.35900751, 44.24778761, 43.90243902, 43.03546313, 42.28682067, 42.11415522, 41.66666667, 41.66666667, 41.60941021, 40.60528344, 40.30995343, 40.12389503, 40.08509616, 39.77678951, 39.64586795, 39.56254168, 38.10569913, 37.81294548, 37.26610848, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(71, 69, 81, NA, 42, 34, 11, 28, 28, 35, NA, NA, 34, 11, 13, 27, 26, NA, 9, NA, NA, 20, 20, NA, 7, 12, NA, 15, NA, NA, 8, 15, 18, NA, 13, NA, NA, 22, NA, 44, 9, NA, NA, 15, NA, NA, NA, 6, NA, NA, NA, NA, NA, 7, NA, NA, 10, NA, NA, NA, NA, 6, 7, 10, NA, NA, 11, NA, NA, 6, NA, NA, 7, NA, NA, NA, NA, 11, 9, NA, NA, 7, NA, NA, NA, NA, 12, NA, NA, NA, NA, NA, NA, NA, 9, NA, 5, NA, NA, 12, 24, 27, 22, 20, 17, 17, 16, 18, 12, 13, 10, 11, 15, 11, 11, 12, 11, 11, 12, 10, 7, 10, 11, 9, 12, 4, 8, 9, 7, 12, 12, 12, 7, 8, 8, 7, 7, 4, 7, 7, 6, 7, 8, 8, 7, 6, 5, 5, 7, 4, 6, 6, 6, 6, 9, 5), 
                          R_FREQ.y = c(450.4732116, 320.5686966, 365.1105077, NA, 278.2901224, 217.3801094, 57.45835399, 217.5368709, 221.5801028, 221.1469481, NA, NA, 239.1907659, 75.73522471, 84.05313922, 199.8368313, 190.8812497, NA, 46.47465203, NA, NA, 128.6632887, 123.5870013, NA, 47.44840233, 74.61121311, NA, 101.0765333, NA, NA, 58.98246824, 107.7763935, 109.0003731, NA, 94.56657072, NA, NA, 104.0912485, NA, 283.9653792, 64.47854435, NA, NA, 105.2166071, NA, NA, NA, 36.39983205, NA, NA, NA, NA, NA, 65.88643905, NA, NA, 50.66094193, NA, NA, NA, NA, 35.55633906, 41.50695547, 60.57923835, NA, NA, 69.55022779, NA, NA, 48.08017724, NA, NA, 41.39832154, NA, NA, NA, NA, 63.65550288, 43.0709688, NA, NA, 42.44749848, NA, NA, NA, NA, 72.68314093, NA, NA, NA, NA, NA, NA, NA, 49.49068199, NA, 42.45207929, NA, NA, 82.44430999, 191.9663602, 159.3283284, 136.6459627, 135.5537168, 104.4729239, 100.8104111, 96.32585743, 82.56880734, 75.46159758, 73.48272526, 70.91505846, 69.3877551, 67.36005709, 67.08615108, 66.8157752, 66.67503566, 65.61097561, 64.84939186, 63.15789474, 62.7942173, 62.13268887, 62.11180124, 61.70539655, 55.05869158, 55.04587156, 54.79452055, 51.8540932, 51.42857143, 50.2951046, 49.58677686, 49.58677686, 49.58677686, 49.33802133, 47.30391608, 46.8974296, 46.63102587, 46.35761589, 45.22812182, 45.09927028, 44.97818829, 44.79486352, 42.35023041, 41.52199702, 40.81125858, 40.63094498, 40.52252733, 40.32258065, 40.32258065, 38.73651342, 38.46153846, 36.46258503, 36.37076413, 36.36363636, 36.36363636, 36.22488991, 34.49085234))
#上位200語ずつの場合の特徴語頻度表
term_200200 <- data.frame(TERM = c("する", ")", "(", "la", "てる", "いる", "何", "ない", "こと", "なる", "来る", "回", "私", "今", "よう", "ん", "の", "時", "一", "前", "やる", "れる", "あなた", "Sha", "さ", "心", "フィーバー", "ある", "敵", "サバイバー", "みる", "自分", "誰", "いつか", "人", "いま", "こぶし", "ちゃう", "拳", "君", "涙", "それ", "2", "いい", "人生", "ナセバナル", "Teenage", "わたし", "くる", "信じる", "やれる", "絶対", "わかる", "みんな", "Blues×", "続ける", "目", "でっかい", "これ", "不安", "出来る", "良い", "はず", "夢", "見える", "人間", "出す", "困難", "もの", "忘れる", "Bang", "Chance", "行く", "未来", "春", "ゆく", "孤独", "見る", "明日", "輝く", "負け", "いく", "バッチ", "3", "者", "夜", "く", "生き残る", "選ぶ", "青春", "度", "チャンス", "待つ", "道", "空", "進む", "風", "年", "歩く", "どこ", "熱い", "サバイバル", "咲く", "降る", "回る", "失敗", "念", "もん", "冬", "'", "一緒", "懸命", "カーニバル", "愛", "叫ぶ", "無駄", "経つ", "花", "入れる", "くれる", "胸", "過去", "ところ", "感じる", "備える", "ここ", "精一杯", "動く", "猪突猛進", "押忍", "桜", "憂い", "向かう", "かける", "雨", "ゲーム", "逃げる", "朝", "流す", "バカ", "ため", "知れる", "得体", "ら", "変える", "力", "知る", "Ya", "切る", "ナゼ", "此処", "名", "歌う", "声", "しれる", "日", "立派", "ドラマ", "笑う", "為す", "成る", "星", "思う", "いける", "事態", "時代", "準備", "態勢", "万端", "頃", "許す", "立ち上がれる", ",", "努力", "Hey", "いかが", "ぼく", "ライトアップ", "頑張る", "世の中", "つく", "全部", "ダメ", "事", "タイムリミット", "1", "チャチャチャチャ", "出る", "行ける", "先輩", "4", "慌てる", "解ける", "急", "近道", "気", "ひとつひとつ", "ブルース", "勲章", "後悔", "言う", "今夜", "そう", "帰る", "恋", "Just", "好き", "会う", "咲かせる", "られる", "まま", "気持ち", "とき", "世界", "クラッカー", "なれる", "浮かれる", "子", "強い", "Try", "彼", "泣く", "La", "気づく", "お願い", "御社", "致す", "すぎる", "みたい", "街", "きみ", "悪い", "純情", "Ah", "鮮やか", "今日", "笑い", "疲れる", "勇気", "スキ", "次", "ほしい", "火傷", "低温", "想い", "見つめる", "なか", "恥ずかしい", "着替える", "舞う", "やかん", "I", "need", "you", "しまう", "いちど", "産声", "ちい", "熱る", "でる", "わがまま", "始まる", "たび", "赤い", "瞬間", "視線", "手", "思い出", "ラブ", "模様", "恋心", "プラネタリウム", "雪", "まんま", "cm", "ひとり", "ごと", "best", "try", "WOW", "Your", "仕方", "ハート", "根拠", "渡せ<e3><82><8b>", "恋愛", "疼く", "髪", "幸せ", "口", "言える", "言葉", "白い", "見せる", "音楽", "奏でる", "瞳", "Lie", "さら", "サンライズ", "飛び出る", "いや", "せる", "上手い", "抱きしめる", "遠い", "Oh", "寒い", "中", "ウチ", "尽くす", "杯", "こころ", "世界中", "希望", "捨てる", "プラットホーム", "合う", "すべて", "色", "スローモーション", "臆病", "駆け引き", "独り占め"), 
                          FREQ.x = c(43, 54, 52, 43, 39, 39, 27, 27, 28, 29, 26, 20, 22, 24, 22, 23, 21, 21, 17, 17, 20, 16, 12, 17, 17, 16, 16, 18, 10, 9, 13, 12, 13, 12, 12, 11, 12, 12, 11, 13, 10, 10, 12, 9, 10, 7, 11, 7, 7, 9, 10, 9, 10, 11, 10, 11, 10, 10, 8, 7, 10, 7, 8, 9, 7, 11, 10, 9, 9, 8, 6, 6, 8, 8, 9, 9, 7, 7, 7, 7, 6, 8, 10, 9, 5, 7, 8, 4, 4, 9, 8, 6, 6, 6, 7, 5, 6, 8, 5, 5, 7, 4, 7, 7, 5, 6, 4, 6, 7, 7, 7, 6, 7, 6, 7, 5, 6, 7, 5, 5, 6, 6, 6, 4, 4, 6, 6, 4, 4, 5, 5, 4, 6, 5, 6, 3, 4, 5, 6, 4, 3, 3, 3, 5, 6, 4, 5, 6, 4, 4, 4, 4, 5, 6, 5, 5, 6, 4, 5, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 4, 4, 4, 5, 3, 3, 3, 3, 4, 4, 4, 4, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          R_FREQ.x = c(286.8502388, 278.1175479, 272.1868036, 267.0807453, 248.8154064, 237.2719217, 175.7200455, 165.827326, 155.6844354, 155.1713871, 139.7969185, 138.0054368, 134.2447417, 131.6750541, 131.5877994, 130.7100548, 130.4403782, 126.0016885, 119.5592717, 113.0738425, 112.6196581, 109.9269693, 106.7849608, 105.5900621, 103.6360804, 102.0969862, 101.2658228, 101.2610762, 94.57971733, 93.75, 88.07031291, 86.76584521, 86.5773998, 79.35358327, 74.17574306, 73.72076747, 72.64248795, 71.98071162, 70.64033802, 69.2934589, 69.25306715, 67.61558676, 67.06272747, 65.47130657, 64.61090802, 64.22018349, 63.58381503, 62.56488087, 61.89781022, 61.08388845, 61.03911151, 60.85099201, 59.98710821, 58.31076377, 57.80346821, 57.56801129, 57.53490515, 57.03703704, 56.61294846, 55.63527181, 54.62484403, 54.43114225, 53.65833874, 52.99758319, 51.68200671, 51.67118338, 51.6666561, 48.50696572, 48.45874521, 47.90521601, 47.61904762, 47.61904762, 47.61442995, 47.36627378, 47.33556551, 46.8660077, 46.81453842, 46.33154989, 45.30614426, 45.08659177, 44.72222222, 44.35900751, 44.24778761, 43.90243902, 43.03546313, 42.28682067, 42.11415522, 41.66666667, 41.66666667, 41.60941021, 40.60528344, 40.30995343, 40.12389503, 40.08509616, 39.77678951, 39.64586795, 39.56254168, 38.10569913, 37.81294548, 37.26610848, 36.7781892, 36.65540541, 36.58167282, 36.57834101, 36.49635036, 35.77284456, 35.71428571, 35.35531498, 35.22396747, 34.73473473, 34.46073338, 34.30713951, 34.14634146, 34.03907477, 33.95895896, 33.92569965, 33.60933704, 33.49282297, 33.08556925, 32.90370197, 32.6935376, 32.59133957, 32.29927309, 32.13330787, 32.13330787, 31.98127603, 31.79698383, 31.78167475, 31.74603175, 31.64556962, 31.64556962, 31.39400922, 31.17674637, 31.14780828, 30.85553997, 30.83333333, 30.73500557, 30.68318091, 30.45685279, 30.02070393, 30.00764526, 30, 30, 29.74034076, 29.69755245, 29.68424139, 29.47888045, 29.26829268, 29.25925926, 29.19708029, 29.19708029, 29.15943466, 28.95260713, 28.93132123, 28.75025793, 28.57005201, 27.77777778, 27.61324042, 27.57519942, 27.52293578, 27.52293578, 27.52293578, 27.45647355, 27.16244726, 26.78571429, 26.78571429, 26.78571429, 26.78571429, 26.78571429, 26.36595058, 26.32911392, 26.23873874, 26.04640038, 25.60151346, 25.47547548, 25.3164557, 25.3164557, 25.3164557, 25.3164557, 25.3164557, 25.01951455, 24.96716312, 24.88126165, 24.84798767, 24.8447205, 24.3902439, 24.3902439, 24.33493536, 23.98305389, 23.95209581, 23.8934056, 23.63748968, 23.52711157, 23.52711157, 23.17228595, 23.12794869, 23.12138728, 23.12138728, 23.12138728, 23.11754741, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(71, 69, 81, NA, 42, 34, 11, 28, 28, 35, NA, NA, 34, 11, 13, 27, 26, 5, 9, NA, NA, 20, 20, NA, 7, 12, NA, 15, NA, NA, 8, 15, 18, 4, 13, NA, NA, 22, NA, 44, 9, NA, NA, 15, 3, NA, NA, 6, 5, 4, NA, NA, NA, 7, NA, NA, 10, NA, NA, 6, NA, 6, 7, 10, 4, NA, 11, NA, 7, 6, NA, NA, 7, NA, 4, 5, NA, 11, 9, 4, NA, 7, NA, NA, NA, 5, 12, NA, NA, 4, NA, NA, NA, NA, 9, NA, 5, NA, 3, 12, 4, NA, 7, NA, 5, NA, NA, NA, NA, NA, NA, NA, NA, 6, NA, 5, NA, 17, NA, NA, 5, NA, NA, 7, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9, NA, NA, 7, NA, NA, NA, NA, NA, NA, 3, NA, 10, NA, NA, 24, NA, NA, NA, 5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA, NA, 6, NA, NA, NA, NA, NA, NA, NA, 12, NA, NA, NA, NA, 27, 22, 20, 17, 16, 18, 13, 11, 15, 11, 11, 12, 11, 11, 12, 10, 10, 11, 9, 12, 4, 8, 9, 7, 12, 12, 12, 7, 8, 8, 7, 4, 7, 7, 8, 8, 5, 5, 7, 4, 6, 6, 6, 6, 5, 6, 4, 5, 5, 6, 6, 5, 5, 5, 5, 4, 4, 6, 6, 5, 5, 6, 5, 5, 6, 3, 4, 5, 3, 3, 3, 4, 4, 3, 5, 4, 4, 6, 6, 6, 6, 6, 2, 2, 2, 2, 2, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 6, 6, 6, 6, 5, 5, 2, 4, 4, 4, 5, 4, 6, 6, 6, 4, 4, 4, 4, 3, 4, 4, 4, 3, 3, 2, 2), 
                          R_FREQ.y = c( 450.4732116, 320.5686966, 365.1105077, NA, 278.2901224, 217.3801094, 57.45835399, 217.5368709, 221.5801028, 221.1469481, NA, NA, 239.1907659, 75.73522471, 84.05313922, 199.8368313, 190.8812497, 27.83147176, 46.47465203, NA, NA, 128.6632887, 123.5870013, NA, 47.44840233, 74.61121311, NA, 101.0765333, NA, NA, 58.98246824, 107.7763935, 109.0003731, 31.76348421, 94.56657072, NA, NA, 104.0912485, NA, 283.9653792, 64.47854435, NA, NA, 105.2166071, 26.23685926, NA, NA, 36.39983205, 25.03781703, 31.85005305, NA, NA, NA, 65.88643905, NA, NA, 50.66094193, NA, NA, 33.06124521, NA, 35.55633906, 41.50695547, 60.57923835, 28.04096459, NA, 69.55022779, NA, 33.46683016, 48.08017724, NA, NA, 41.39832154, NA, 23.51718471, 28.27810297, NA, 63.65550288, 43.0709688, 26.39956945, NA, 42.44749848, NA, NA, NA, 32.35497097, 72.68314093, NA, NA, 33.30054107, NA, NA, NA, NA, 49.49068199, NA, 42.45207929, NA, 24.34778783, 82.44430999, 26.57416662, NA, 45.09927028, NA, 30.59685413, NA, NA, NA, NA, NA, NA, NA, NA, 44.79486352, NA, 30.56802353, NA, 104.4729239, NA, NA, 29.01961716, NA, NA, 62.13268887, NA, 29.16867277, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 36.22488991, NA, NA, 46.63102587, NA, NA, NA, NA, NA, NA, 24.94505495, NA, 70.91505846, NA, NA, 191.9663602, NA, NA, NA, 31.28494931, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 40.63094498, NA, NA, NA, NA, 40.52252733, NA, NA, NA, NA, NA, NA, NA, 75.46159758, NA, NA, NA, NA, 159.3283284, 136.6459627, 135.5537168, 100.8104111, 96.32585743, 82.56880734, 73.48272526, 69.3877551, 67.36005709, 67.08615108, 66.8157752, 66.67503566, 65.61097561, 64.84939186, 63.15789474, 62.7942173, 62.11180124, 61.70539655, 55.05869158, 55.04587156, 54.79452055, 51.8540932, 51.42857143, 50.2951046, 49.58677686, 49.58677686, 49.58677686, 49.33802133, 47.30391608, 46.8974296, 46.35761589, 45.22812182, 44.97818829, 42.35023041, 41.52199702, 40.81125858, 40.32258065, 40.32258065, 38.73651342, 38.46153846, 36.46258503, 36.37076413, 36.36363636, 36.36363636, 34.49085234, 34.07794666, 33.95454294, 33.74683207, 33.47299185, 33.35804299, 32.96703297, 32.67973856, 32.67973856, 32.67973856, 32.63524832, 32.25806452, 32.25806452, 31.57894737, 31.57894737, 31.28469816, 30.90532656, 30.82989834, 30.31297719, 29.96790817, 29.46729231, 29.37462081, 29.25137506, 29.06832298, 28.84615385, 28.84615385, 28.84615385, 28.16901408, 28.16901408, 28.03738318, 27.96048455, 27.8287607, 27.81446022, 27.52293578, 27.52293578, 27.52293578, 27.52293578, 27.52293578, 27.39726027, 27.39726027, 27.39726027, 27.39726027, 27.39726027, 27.21088435, 27.12386778, 27.06804661, 26.46820686, 26.36033758, 26.34789269, 26.32959036, 26.14379085, 26.14379085, 25.80469597, 25.75107296, 25.75107296, 25.75107296, 25.75107296, 25.62765701, 25.36194023, 25.32653711, 25.27967639, 25.19310755, 25.10234863, 24.97114041, 24.93954119, 24.79338843, 24.79338843, 24.79338843, 24.59307036, 24.58296752, 24.39349507, 24.39349507, 24.19354839, 24.02000118, 24.01722163, 23.82322264, 23.52261472, 23.52261472, 23.31401475, 23.31401475))


TF-IDFによる特徴語

それぞれ84・168・344語となった。相対頻度で200語ずつ取り出した時の特徴語が298語なので、344語と随分差がついた。両方の特徴語を合わせると494語で重複は148語だった。多くの文書に含まれている単語は値が低くなるというTF-IDFの性質が確認できる。ただし、docMatrix()だと解析結果に助詞が含まれていないのでそういった要因もある(公式の仕様は確認していません)。
この方法の課題は、"サバイバー"(「サバイバー」)や"フィーバー"(「桜ナイトフィーバー」)、"ナセバナル"(「ナセバナル」)といった1曲の中で繰り返し使われる単語の値が高くなってしまうこと。この3つは1曲でだけ登場する単語で、50語ずつ選んだ場合の特徴語にも含まれている。相対頻度上位50語ずつであれば含まれていない。

データフレーム作成コード(クリックで展開)

#上位50語ずつの場合の特徴語頻度表
term_5050 <- data.frame(TERM = c(")", "(", "la", "回", "こと", "私", "フィーバー", "時", "サバイバー", "ナセバナル", "あなた", "何", "ない", "敵", "今", "よう", "ん", "心", "いま", "一", "さ", "人", "の", "君", "でっかい", "バッチ", "いつか", "いい", "Bang", "Chance", "Teenage", "拳", "人間", "花", "2", "Blues×", "3", "自分", "押忍", "誰", "不安", "前", "こぶし", "わたし", "困難", "先輩", "ナゼ", "此処", "Sha", "それ", "今夜", "そう", "クラッカー", "Just", "きみ", "恋", "La", "日", "世界", "彼", "お願い", "御社", "やかん", "気", "火傷", "低温", "好き", "みんな", "笑い", "涙", "どこ", "子", "強い", "純情", "Try", "プラネタリウム", "雪", "とき", "夢", "I", "need", "you", "Ah", "スキ"), 
                        FREQ.x = c(1.328845446, 1.303345926, 0.913610107, 0.868639039, 0.862958771, 0.806265903, 0.763992939, 0.734357078, 0.72497559, 0.716093151, 0.713712139, 0.69806742, 0.690560274, 0.690558404, 0.686723239, 0.630005429, 0.628626331, 0.620476787, 0.608669638, 0.593157659, 0.58426696, 0.562231006, 0.552171902, 0.509923143, 0.503964229, 0.500723164, 0.500248628, 0.494870843, 0.492242205, 0.492242205, 0.489869914, 0.477401311, 0.455003938, 0.448033068, 0.446255476, 0.445336286, 0.430769995, 0.430741923, 0.425509248, 0.416033419, 0.402200088, 0.402154469, 0.394972288, 0.376878773, 0.373203253, 0.370780141, 0.362938864, 0.362938864, 0.361194694, 0.360829208, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                        FREQ.y = c(1.232068777, 1.383575592, NA, NA, 0.892433428, 0.967302389, NA, NA, NA, NA, 0.776954594, NA, 0.982896112, NA, 0.402197972, 0.458945942, 0.91840227, NA, NA, NA, NA, 0.568332208, 0.634763878, 1.373196783, NA, NA, NA, 0.572998924, NA, NA, NA, NA, NA, 0.673198054, NA, NA, NA, 0.600909302, NA, 0.510499129, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.929801624, 0.650179948, 0.632720576, 0.629847924, 0.577503353, 0.570460674, 0.55747211, 0.551709888, 0.551221143, 0.514863277, 0.500609909, 0.500609909, 0.482649287, 0.472746246, 0.464212783, 0.464212783, 0.462982605, 0.458936426, 0.441163998, 0.439054877, 0.438992824, 0.428798405, 0.42689071, 0.424120159, 0.419898616, 0.396969584, 0.396969584, 0.396949341, 0.393051518, 0.377031426, 0.377031426, 0.377031426, 0.374596414, 0.373151595))
#上位100語ずつの場合の特徴語頻度表
term_100100 <- data.frame(TERM = c(")", "(", "la", "回", "こと", "私", "フィーバー", "時", "サバイバー", "ナセバナル", "あなた", "何", "ない", "敵", "今", "よう", "ん", "心", "いま", "一", "さ", "人", "の", "君", "でっかい", "バッチ", "いつか", "いい", "Bang", "Chance", "Teenage", "拳", "人間", "花", "2", "Blues×", "3", "自分", "押忍", "誰", "不安", "前", "こぶし", "わたし", "困難", "先輩", "ナゼ", "此処", "Sha", "それ", "人生", "未来", "春", "念", "みんな", "目", "明日", "綺羅星", "青春", "世の中", "度", "カーニバル", "夢", "絶対", "'", "猪突猛進", "失敗", "立派", "風", "孤独", "星", "涙", "得体", "これ", "もの", "良い", "負け", "空", "Ya", "過去", "ここ", "精一杯", "道", "もん", "冬", "的", "卵", "年", "名", "憂い", "力", "事態", "時代", "準備", "態勢", "万端", "now", "者", "無駄", "一緒", "今夜", "そう", "クラッカー", "Just", "きみ", "恋", "La", "日", "世界", "彼", "お願い", "御社", "やかん", "気", "火傷", "低温", "好き", "笑い", "どこ", "子", "強い", "純情", "Try", "プラネタリウム", "雪", "とき", "I", "need", "you", "Ah", "スキ", "なか", "まんま", "まま", "いちど", "産声", "悪い", "Lie", "さら", "サンライズ", "気持ち", "次", "髪", "恥ずかしい", "街", "cm", "鮮やか", "音楽", "みたい", "ちい", "愛", "ほしい", "深い", "端", "ら", "いや", "瞬間", "ざかり", "つまり", "気高い", "結局", "夜", "ラブ", "模様", "恋心", "ダメ", ") ", "プラットホーム"), 
                          FREQ.x = c(1.328845446, 1.303345926, 0.913610107, 0.868639039, 0.862958771, 0.806265903, 0.763992939, 0.734357078, 0.72497559, 0.716093151, 0.713712139, 0.69806742, 0.690560274, 0.690558404, 0.686723239, 0.630005429, 0.628626331, 0.620476787, 0.608669638, 0.593157659, 0.58426696, 0.562231006, 0.552171902, 0.509923143, 0.503964229, 0.500723164, 0.500248628, 0.494870843, 0.492242205, 0.492242205, 0.489869914, 0.477401311, 0.455003938, 0.448033068, 0.446255476, 0.445336286, 0.430769995, 0.430741923, 0.425509248, 0.416033419, 0.402200088, 0.402154469, 0.394972288, 0.376878773, 0.373203253, 0.370780141, 0.362938864, 0.362938864, 0.361194694, 0.360829208, 0.36060559, 0.360521314, 0.360461682, 0.35834973, 0.358347146, 0.35457317, 0.347870332, 0.343583643, 0.342563539, 0.340407399, 0.340170108, 0.33504333, 0.334796748, 0.333096435, 0.329096232, 0.32816147, 0.3174746, 0.312518953, 0.311189753, 0.309866141, 0.306897065, 0.306789214, 0.305817952, 0.305266408, 0.304424566, 0.294592698, 0.294428333, 0.290491263, 0.287179997, 0.286006481, 0.285759865, 0.285467137, 0.284478343, 0.282772547, 0.280518544, 0.278085106, 0.273313978, 0.273149368, 0.272752932, 0.270331099, 0.269068661, 0.268762297, 0.268762297, 0.268762297, 0.268762297, 0.268762297, 0.262527949, 0.261869176, 0.261458948, 0.260435211, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(1.232068777, 1.383575592, NA, NA, 0.892433428, 0.967302389, NA, NA, NA, NA, 0.776954594, 0.370689899, 0.982896112, NA, 0.402197972, 0.458945942, 0.91840227, 0.363026471, NA, 0.301810937, 0.32724004, 0.568332208, 0.634763878, 1.373196783, NA, NA, 0.330069497, 0.572998924, NA, NA, NA, NA, NA, 0.673198054, NA, NA, NA, 0.600909302, NA, 0.510499129, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.458936426, 0.350537284, 0.271267278, NA, 0.3411743, NA, NA, NA, 0.393051518, NA, NA, NA, NA, NA, 0.264502532, NA, NA, 0.439054877, NA, NA, NA, 0.264696738, NA, 0.312918521, NA, NA, 0.272987501, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.358247205, NA, 0.929801624, 0.650179948, 0.632720576, 0.629847924, 0.577503353, 0.570460674, 0.55747211, 0.551709888, 0.551221143, 0.514863277, 0.500609909, 0.500609909, 0.482649287, 0.472746246, 0.464212783, 0.464212783, 0.462982605, 0.441163998, 0.438992824, 0.428798405, 0.42689071, 0.424120159, 0.419898616, 0.396969584, 0.396969584, 0.396949341, 0.377031426, 0.377031426, 0.377031426, 0.374596414, 0.373151595, 0.368978261, 0.366122824, 0.360299625, 0.352931199, 0.352931199, 0.348517813, 0.346611573, 0.346611573, 0.346611573, 0.342042574, 0.337732336, 0.333717116, 0.333251866, 0.330541781, 0.330065696, 0.325972331, 0.322269238, 0.320585497, 0.316360288, 0.316165033, 0.309682207, 0.30655866, 0.30655866, 0.301000219, 0.29366149, 0.287635389, 0.281760935, 0.281760935, 0.281760935, 0.281760935, 0.280115, 0.279863696, 0.279863696, 0.279863696, 0.271285759, 0.268441528, 0.264698399))
#上位200語ずつの場合の特徴語頻度表
term_200200 <- data.frame(TERM = c(")", "(", "la", "回", "こと", "私", "フィーバー", "時", "サバイバー", "ナセバナル", "あなた", "何", "ない", "敵", "今", "よう", "ん", "心", "いま", "一", "さ", "人", "の", "君", "でっかい", "バッチ", "いつか", "いい", "Bang", "Chance", "Teenage", "拳", "人間", "花", "2", "Blues×", "3", "自分", "押忍", "誰", "不安", "前", "こぶし", "わたし", "困難", "先輩", "ナゼ", "此処", "Sha", "それ", "人生", "未来", "春", "念", "みんな", "目", "明日", "綺羅星", "青春", "世の中", "度", "カーニバル", "夢", "絶対", "'", "猪突猛進", "失敗", "立派", "風", "孤独", "星", "涙", "得体", "これ", "もの", "良い", "負け", "空", "Ya", "過去", "ここ", "精一杯", "道", "もん", "冬", "的", "卵", "年", "名", "憂い", "力", "事態", "時代", "準備", "態勢", "万端", "now", "者", "無駄", "一緒", "チャンス", "勿体ない", "どこ", "皆", "派手", "愛", "デッカ", ",", "熱い", "サバイバル", "声", "Hey", "雨", "朝", "1", "チャチャチャチャ", "桜", "大切", "懸命", "日", "頃", "夜", "簡単", "ドラマ", "努力", "急", "バカ", "ため", "ゲーム", "瞳", "生き方", "嫌い", "無い", "勿体", "友達", "ら", "近道", "ダメ", "ジャンプ", "握り", "悔し涙", "喝", "嬉し涙", "魂", "ところ", "ギブギブギブアップアップ", "つもり", "愚痴", "子", "滅多", "はず", "優しい", "T", "4", " (", "100", "DIVE", "FUN", "JOY", "TIME", "全て", "CAN", "GO", "SO", "STOP", "THE", "TO", "TOP", "WE", "ビート", "糧", "謙虚", "諸行無常", "槍", "大胆", "真っ直ぐ", "真っ白", "土", "次", "サンバ", "夏", "世界中", "事", "いかが", "ぼく", "ライトアップ", "胸", "気", "高い", "厳しい", "矢", "運命", "ば", "モノ", "引き換え", "疑問", "荒野", "自由", "答え", "独りぼっち", "今夜", "そう", "クラッカー", "Just", "きみ", "恋", "La", "世界", "彼", "お願い", "御社", "やかん", "火傷", "低温", "好き", "笑い", "強い", "純情", "Try", "プラネタリウム", "雪", "とき", "I", "need", "you", "Ah", "スキ", "なか", "まんま", "まま", "いちど", "産声", "悪い", "Lie", "さら", "サンライズ", "気持ち", "髪", "恥ずかしい", "街", "cm", "鮮やか", "音楽", "みたい", "ちい", "ほしい", "深い", "端", "いや", "瞬間", "ざかり", "つまり", "気高い", "結局", "ラブ", "模様", "恋心", ") ", "プラットホーム", "今日", "ハート", "根拠", "恋愛", "遠い", "勇気", "赤い", "ウチ", "杯", "デート", "Oh", "?)", "レッツゴー", "炎", "春一番", "素", "ごと", "カタチ", "つらい", "たび", "現在", "真ん中", "性", "Surface", "Tension", "白い", "手", "思い出", "コンビニ", "カメリア", "胸中", "上手い", "中", "LOVE", "ふたり", "肩", "さみしい", "想い", "希望", "視線", "Graduate", "デー", "ハッピー", "めでたい", "best", "try", "WOW", "Your", "仕方", "シンプル", "幸せ", "音", "鼓動", "全部", "たま", "匂い", "わがまま", "ひとり", "すべて", "スローモーション", "臆病", "ロマンティック", "銀河", "刻", "粉雪", "こころ", "二", "言葉", "越し", "口", "おしゃれ", "イマドキ", "ウソ", "キライ", "クライシス", "乙女", "花びら", "美学", "友だち", "色", "寒い", "駆け引き", "独り占め", "指先", "不器用"), 
                          FREQ.x = c(1.328845446, 1.303345926, 0.913610107, 0.868639039, 0.862958771, 0.806265903, 0.763992939, 0.734357078, 0.72497559, 0.716093151, 0.713712139, 0.69806742, 0.690560274, 0.690558404, 0.686723239, 0.630005429, 0.628626331, 0.620476787, 0.608669638, 0.593157659, 0.58426696, 0.562231006, 0.552171902, 0.509923143, 0.503964229, 0.500723164, 0.500248628, 0.494870843, 0.492242205, 0.492242205, 0.489869914, 0.477401311, 0.455003938, 0.448033068, 0.446255476, 0.445336286, 0.430769995, 0.430741923, 0.425509248, 0.416033419, 0.402200088, 0.402154469, 0.394972288, 0.376878773, 0.373203253, 0.370780141, 0.362938864, 0.362938864, 0.361194694, 0.360829208, 0.36060559, 0.360521314, 0.360461682, 0.35834973, 0.358347146, 0.35457317, 0.347870332, 0.343583643, 0.342563539, 0.340407399, 0.340170108, 0.33504333, 0.334796748, 0.333096435, 0.329096232, 0.32816147, 0.3174746, 0.312518953, 0.311189753, 0.309866141, 0.306897065, 0.306789214, 0.305817952, 0.305266408, 0.304424566, 0.294592698, 0.294428333, 0.290491263, 0.287179997, 0.286006481, 0.285759865, 0.285467137, 0.284478343, 0.282772547, 0.280518544, 0.278085106, 0.273313978, 0.273149368, 0.272752932, 0.270331099, 0.269068661, 0.268762297, 0.268762297, 0.268762297, 0.268762297, 0.268762297, 0.262527949, 0.261869176, 0.261458948, 0.260435211, 0.258494504, 0.257687732, 0.256519615, 0.256018896, 0.256018896, 0.255697116, 0.255305549, 0.253862105, 0.252871837, 0.249550068, 0.246545683, 0.244497542, 0.242787798, 0.239937953, 0.239316664, 0.239316664, 0.238747793, 0.237536456, 0.23412801, 0.231622628, 0.226652728, 0.226449912, 0.225320239, 0.223247031, 0.222104429, 0.220124785, 0.217128942, 0.213910483, 0.213618368, 0.213499583, 0.208527135, 0.208345969, 0.208345969, 0.208345969, 0.208345969, 0.206986997, 0.206935163, 0.20622403, 0.204985484, 0.204985484, 0.204985484, 0.204985484, 0.204985484, 0.204953892, 0.204649734, 0.203878634, 0.203878634, 0.203878634, 0.203878634, 0.203878634, 0.202864921, 0.202533058, 0.202198196, 0.201207866, 0.200289266, 0.200289266, 0.200289266, 0.200289266, 0.200289266, 0.200289266, 0.199121014, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.196895962, 0.192811442, 0.192352583, 0.192352583, 0.192352583, 0.192352583, 0.192014172, 0.192014172, 0.192014172, 0.191956949, 0.191453331, 0.191453331, 0.191453331, 0.191342673, 0.190998235, 0.190998235, 0.190998235, 0.19088509, 0.190789044, 0.190294673, 0.185390071, 0.185390071, 0.181876703, 0.181469432, 0.181469432, 0.181469432, 0.181469432, 0.181469432, 0.181469432, 0.181469432, 0.181469432, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                          FREQ.y = c(1.232068777, 1.383575592, NA, NA, 0.892433428, 0.967302389, NA, 0.211910734, NA, NA, 0.776954594, 0.370689899, 0.982896112, NA, 0.402197972, 0.458945942, 0.91840227, 0.363026471, 0.241701929, 0.301810937, 0.32724004, 0.568332208, 0.634763878, 1.373196783, NA, NA, 0.330069497, 0.572998924, NA, NA, NA, NA, NA, 0.673198054, NA, NA, NA, 0.600909302, NA, 0.510499129, 0.219590963, NA, NA, 0.230789784, NA, NA, NA, NA, NA, NA, 0.231624218, NA, 0.233169062, NA, 0.458936426, 0.350537284, 0.271267278, NA, 0.3411743, NA, NA, NA, 0.393051518, NA, NA, NA, NA, NA, 0.264502532, NA, NA, 0.439054877, NA, NA, 0.216577642, 0.264696738, NA, 0.312918521, NA, NA, 0.272987501, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.358247205, NA, NA, NA, 0.438992824, NA, NA, 0.316165033, NA, NA, 0.246011994, NA, 0.202429882, NA, NA, NA, NA, NA, NA, NA, NA, 0.551709888, 0.186276091, 0.280115, NA, NA, NA, NA, NA, 0.180246558, NA, 0.189612883, NA, NA, NA, NA, NA, 0.301000219, NA, 0.271285759, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.428798405, NA, 0.242200333, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.337732336, NA, NA, 0.241643951, NA, NA, NA, NA, 0.185122766, 0.472746246, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.929801624, 0.650179948, 0.632720576, 0.629847924, 0.577503353, 0.570460674, 0.55747211, 0.551221143, 0.514863277, 0.500609909, 0.500609909, 0.482649287, 0.464212783, 0.464212783, 0.462982605, 0.441163998, 0.42689071, 0.424120159, 0.419898616, 0.396969584, 0.396969584, 0.396949341, 0.377031426, 0.377031426, 0.377031426, 0.374596414, 0.373151595, 0.368978261, 0.366122824, 0.360299625, 0.352931199, 0.352931199, 0.348517813, 0.346611573, 0.346611573, 0.346611573, 0.342042574, 0.333717116, 0.333251866, 0.330541781, 0.330065696, 0.325972331, 0.322269238, 0.320585497, 0.316360288, 0.309682207, 0.30655866, 0.30655866, 0.29366149, 0.287635389, 0.281760935, 0.281760935, 0.281760935, 0.281760935, 0.279863696, 0.279863696, 0.279863696, 0.268441528, 0.264698399, 0.260816176, 0.257431639, 0.257431639, 0.257431639, 0.25729402, 0.256659047, 0.252942153, 0.250304954, 0.250304954, 0.250287837, 0.249790531, 0.247765382, 0.247765382, 0.247501437, 0.247501437, 0.247501437, 0.24608643, 0.244081882, 0.244081882, 0.242526897, 0.241701929, 0.241701929, 0.241701929, 0.241324643, 0.241324643, 0.241202178, 0.240368163, 0.236243481, 0.235323369, 0.232519701, 0.232106392, 0.229699424, 0.228674654, 0.226218856, 0.22469681, 0.222343106, 0.219053142, 0.213946511, 0.211702275, 0.211378027, 0.210906859, 0.210906859, 0.210906859, 0.210906859, 0.209949308, 0.209949308, 0.209949308, 0.209949308, 0.209949308, 0.209366407, 0.208040477, 0.207131296, 0.206792578, 0.205691819, 0.203122475, 0.201540982, 0.200724615, 0.200479581, 0.199349995, 0.199030964, 0.199030964, 0.198484792, 0.198484792, 0.198484792, 0.198484792, 0.197371262, 0.197337065, 0.197112482, 0.195606385, 0.195116707, 0.193999059, 0.186575797, 0.186575797, 0.186575797, 0.186575797, 0.186575797, 0.186575797, 0.186575797, 0.185824037, 0.184936538, 0.181210455, 0.180833345, 0.180833345, 0.180246558, 0.180200419))


データの確認は以上になります。このデータを用いて次回以降文書分類を行っていきます。

〇おわりに

 データフレームを操作する時にうまくいかなければ、その前に変更を加えたオブジェクトのクラスを確認してdata.frameになっていれば%>% as.data.frame()しとけば大体動作しました。
 あと、データフレーム作成コードというやつのだとTERM列がファクターになるようで、as.charcter()しないといけません。後で直さなきゃ…

 この記事の総文字数が10万オーバーととんでもないことになってて、この最高文字数を超えることは当分ないだろなー。ほとんど読む為の部分じゃないけどね。

 ではデータの確認は終わりです。ここまで読んでいただいたのならば、次のクラスタリングも是非ぜひ読んでください!

www.anarchive-beta.com