Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning

  • Ernest Lim Sekolah Tinggi Teknik Surabaya
  • Esther Irawati Setiawan
  • Joan Santoso
Keywords: Bahasa Indonesia, Deep Learning, fastText, Media Sosial, Stance Classification

Abstract

Misinformasi merupakan fenomena yang semakin sering terjadi di media sosial, tidak terkecuali Facebook, salah satu media sosial terbesar di Indonesia. Beberapa penelitian telah dilakukan mengenai teknik identifikasi dan klasifikasi stance di media sosial Indonesia. Akan tetapi, penggunaan Word2Vec sebagai word embedding dalam penelitian tersebut memiliki keterbatasan pada pengenalan kata baru. Hal ini menjadi dasar penggunaan fastText embedding dalam penelitian ini. Dengan menggunakan pendekatan deep learning, penelitian berfokus pada performa model dalam klasifikasi stance suatu judul post kesehatan di Facebook terhadap judul post lainnya. Stance berupa for (setuju), observing (netral), dan against (berlawanan). Dataset terdiri dari 3500 judul post yang terdiri dari 500 kalimat klaim dengan enam kalimat stance terhadap setiap klaim. Model dengan fastText pada penelitian ini mampu menghasilkan F1 macro score sebesar 64%.

References

[1] D. Goldie, M. Linick, H. Jabbar, and C. Lubienski, “Using Bibliometric and Social Media Analyses to Explore the ‘Echo Chamber’ Hypothesis,” Educ. Policy, vol. 28, no. 2, pp. 281–305, 2014, doi: 10.1177/0895904813515330.
[2] S. Jacobson, E. Myung, and S. L. Johnson, “Open media or echo chamber: the use of links in audience discussions on the Facebook Pages of partisan news organizations,” Inf. Commun. Soc., vol. 19, no. 7, pp. 875–891, 2016, doi: 10.1080/1369118X.2015.1064461.
[3] J. J. Van Bavel and A. Pereira, “The Partisan Brain: An Identity-Based Model of Political Belief,” Trends Cogn. Sci., vol. 22, no. 3, pp. 213–224, 2018, doi: 10.1016/j.tics.2018.01.004.
[4] F. Zollo and W. Quattrociocchi, “Misinformation Spreading on Facebook,” pp. 177–196, 2018, doi: 10.1007/978-3-319-77332-2_10.
[5] C. Shao, G. L. Ciampaglia, O. Varol, K. C. Yang, A. Flammini, and F. Menczer, “The spread of low-credibility content by social bots,” Nat. Commun., vol. 9, no. 1, 2018, doi: 10.1038/s41467-018-06930-7.
[6] H. Allcott and M. Gentzkow, “Social Media and Fake News in the 2016 Election,” J. Econ. Perspect., vol. 31, no. 2, pp. 211–236, 2017, doi: 10.1257/jep.31.2.211.
[7] C. Reuter, S. Stieglitz, and M. Imran, “Social media in conflicts and crises,” Behav. Inf. Technol., vol. 39, no. 3, pp. 241–251, 2020, doi: 10.1080/0144929X.2019.1629025.
[8] M. Roy, N. Moreau, C. Rousseau, A. Mercier, A. Wilson, and L. Atlani-Duault, “Ebola and Localized Blame on Social Media: Analysis of Twitter and Facebook Conversations During the 2014–2015 Ebola Epidemic,” Cult. Med. Psychiatry, vol. 44, no. 1, pp. 56–79, 2020, doi: 10.1007/s11013-019-09635-8.
[9] S. Sommariva, C. Vamos, A. Mantzarlis, L. U. L. Đào, and D. Martinez Tyson, “Spreading the (Fake) News: Exploring Health Messages on Social Media and the Implications for Health Professionals Using a Case Study,” Am. J. Heal. Educ., vol. 49, no. 4, pp. 246–255, 2018, doi: 10.1080/19325037.2018.1473178.
[10] S. M. Mohammad, P. Sobhani, and S. Kiritchenko, “Stance and Sentiment in Tweets,” vol. 0, no. 0, 2016, [Online]. Available: http://arxiv.org/abs/1605.01655.
[11] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” ICLR Work., pp. 1–12, 2013, [Online]. Available: http://arxiv.org/abs/1301.3781.
[12] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural Language Processing (Almost) from Scratch,” J. Mach. Learn. Res., vol. 12, pp. 2493–2537, 2011, doi: 10.1109/CIC.2017.00050.
[13] Y. Kim, “Convolutional neural networks for sentence classification,” EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1746–1751, 2014.
[14] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., pp. 632–642, 2015, doi: 10.18653/v1/d15-1075.
[15] R. Jannati, R. Mahendra, C. W. Wardhana, and M. Adriani, “Stance Classification Towards Political Figures on Blog Writing,” Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 96–101, 2019, doi: 10.1109/IALP.2018.8629144.
[16] E. I. Setiawan, A. Ferdianto, J. Santoso, Y. Kristian, S. Sumpeno, and M. H. Purnomo, “Analisis Pendapat Masyarakat terhadap Berita Kesehatan Indonesia menggunakan Pemodelan Kalimat berbasis LSTM ( Indonesian Stance Analysis of Healthcare News using Sentence Embedding,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 9, no. 1, pp. 8–17, 2020.
[17] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, 2017, doi: 10.1162/tacl_a_00051.
[18] A. Graves and J. Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM Networks,” Proceedings. 2005 IEEE Int. Jt. Conf. Neural Networks, 2005., vol. 4, pp. 2047–2052, 2005.
[19] M. Thomas, B. Pang, and L. Lee, “Get out the vote: Determining support or opposition from Congressional oor-debate transcripts,” In Proc. of EMNLP, no. July, pp. 327–335, 2006.
[20] K. Hasan and V. Ng, “Stance Classification of Ideological Debates : Data , Models , Features , and Constraints,” Proc. SIGDIAL 2013 Conf., no. October, pp. 1348–1356, 2013.
[21] B. Riedel, I. Augenstein, G. P. Spithourakis, and S. Riedel, “A simple but tough-to-beat baseline for the Fake News Challenge stance detection task,” pp. 1–6, 2017, [Online]. Available: http://arxiv.org/abs/1707.03264.
[22] M. García Lozano, H. Lilja, E. Tjörnhammar, and M. Karasalo, “Mama Edha at SemEval-2017 Task 8: Stance Classification with CNN and Rules,” Proc. 11th Int. Work. Semant. Eval., pp. 481–485, 2018, doi: 10.18653/v1/s17-2084.
[23] W.-F. Chen and L.-W. Ku, “UTCNN: a Deep Learning Model of Stance Classificationon on Social Media Text,” Proc. COLING 2016, 26th Int. Conf. Comput. Linguist. Tech. Pap., pp. 1635–1645, 2016, [Online]. Available: http://arxiv.org/abs/1611.03599.
[24] A. Hanselowski et al., “A Retrospective Analysis of the Fake News Challenge Stance Detection Task,” 2018, [Online]. Available: http://arxiv.org/abs/1806.05180.
[25] I. Augenstein, T. Rocktäschel, A. Vlachos, and K. Bontcheva, “Stance Detection with Bidirectional Conditional Encoding,” pp. 876–885, 2016, doi: 10.18653/v1/d16-1084.
[26] I. Habernalt and I. Gurevych, “Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM,” 54th Annu. Meet. Assoc. Comput. Linguist. ACL 2016 - Long Pap., vol. 3, pp. 1589–1599, 2016, doi: 10.18653/v1/p16-1150.
[27] D. Mrowca and E. Wang, “Stance detection for fake news identification,” pp. 1–12, 2017.
[28] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, pp. 1137–1155, 2003, doi: 10.1080/1536383X.2018.1448388.
[29] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.
[30] I. Santos, N. Nedjah, and L. de Macedo Mourelle, “Sentiment analysis using convolutional neural network with fastText embeddings,” pp. 1–5, 2018, doi: 10.1109/la-cci.2017.8285683.
[31] R. Kiros et al., “Skip-thought vectors,” Adv. Neural Inf. Process. Syst., vol. 2015-Janua, no. 786, pp. 3294–3302, 2015.
[32] L. Logeswaran and H. Lee, “An efficient framework for learning sentence representations,” 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., pp. 1–16, 2018.
[33] Y. Adi, E. Kermany, Y. Belinkov, O. Lavi, and Y. Goldberg, “Fine-grained analysis of sentence embeddings using auxiliary prediction tasks,” 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., pp. 1–13, 2019.
[34] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “Towards universal paraphrastic sentence embeddings,” 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., pp. 1–19, 2016.
[35] Y. Zhang and B. Wallace, “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification,” 2015, [Online]. Available: http://arxiv.org/abs/1510.03820.
[36] E. Kochkina, M. Liakata, and I. Augenstein, “Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM,” no. 2016, pp. 475–480, 2018, doi: 10.18653/v1/s17-2083.
Published
2020-07-15