Accepted Papers

27 Binaural Speech Intelligibility Estimation Using Deep Neural Networks Kazuhiro Kondo, Kazuya Taira and Yosuke Kobayashi
34 Real-Time Scoring of an Oral Reading Assessment on Mobile Devices Jian Cheng
38 Conditional End-to-End Audio Transformations Albert Haque, Michelle Guo and Prateek Verma
40 Speech recognition for medical conversations Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
41 Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification Lanhua You, Wu Guo, Yan Song and Sheng Zhang
42 Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text Iroro Orife
43 Frequency domain variants of velvet noise and their application to speech processing and synthesis Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino
45 A novel normalization method for autocorrelation function for pitch detection and for speech activity detection Qiguang Lin and Yiwen Shao
46 Dithered Quantization for Frequency-Domain Speech and Audio Coding Tom Bäckström, Johannes Fischer and sneha das
47 Categorical vs Dimensional Perception of Italian Emotional Speech Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
48 Cross-language perception of Mandarin lexical tones by Mongolian-speaking bilinguals in the Inner Mongolia Autonomous Region, China Kimiko Tsukada and Yu rong
50 Leveraging Mixture Structure from Multivariate von Mises-Fisher Distributions for Informed Speaker Clustering for Naturalistic Audio Streams Harishchandra Dubey, Abhijeet Sangwan and John H. L. Hansen
51 The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats Björn Schuller, Stefan Steidl, Anton Batliner, Peter Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian Pokorny, Eva-Maria Rathner, Karin Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
52 Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech Emre Yilmaz, Henk van den Heuvel and David van Leeuwen
57 Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features Jochen Weiner, Miguel Angrick, Srinivasan Umesh and Tanja Schultz
60 The Trajectory of Voice Onset Time with Vocal Aging Chen Xuanda, Xiong Ziyu and Hu Jian
61 Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons Moez Ajili, Jean-Francois Bonastre and Solange Rossato
62 Entity-Aware Language Model as an Unsupervised Reranker Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy
63 Effects of User Controlled Speech Rate on Intelligibility in Noisy Environments John Novak and Robert Kenyon
65 The ‘West Yorkshire Regional English Database’: Investigations into the generalizability of reference populations for forensic speaker comparison casework Erica Gold, Sula Ross and Kate Earnshaw
67 Articulatory Features for ASR of Pathological Speech Emre Yilmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
68 Vowel space as a tool to evaluate articulation problems Rob van Son, Catherine Middag and Kris Demuynck
69 Performance Analysis of the 2017 NIST Language Recognition Evaluation Seyed Omid Sadjadi, Timothee Kheyrkhah, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason and Jaime Hernandez-Cordero
70 Gated Convolutional Neural Network for Sentence Matching Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You
73 COSMO SylPhon: a model to assess phonological learning Jean-Luc Schwartz
78 Active Memory Networks for Language Modeling Oscar Chen, Anton Ragni, Mark Gales and Xie Chen
79 Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models Naoyuki Kanda, Yusuke Fujita and Kenji Nagamatsu
83 Deep Speech Denoising with Vector Space Projections Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
84 What to Expect from Expected Kneser-Ney Smoothing Michael Levit, Sarangarajan Parthasarathy and Shuangyu Chang
91 Emotional Prosody Perception in Mandarin-speaking Congenital Amusics Yixin Zhang, Tianzhu Geng and Jinsong Zhang
92 Analysis of Length Normalization in End-to-End Speaker Verification System Weicheng Cai, Jinkun Chen and Ming Li
97 Overview of the 2018 Spoken CALL Shared Task Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
990 Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks Yun Wang, Juncheng Li and Florian Metze
991 Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach Ragesh Rajan M, Ashwin Vijayakumar and Deepu Vijayasenan
993 Attentive Statistics Pooling for Deep Speaker Embedding Koji Okabe, Takafumi Koshinaka and Koichi Shinoda
995 UltraFit: A speaker-friendly headset for ultrasound recordings in speech sciences Lorenzo Spreafico, Michael Pucher and Anna Matosova
996 Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
999 Articulatory-to-speech conversion using bi-directional long short-term memory Fumiaki Taguchi and Tokihiko Kaburagi
1000 The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task Kay Berkling, Cem Philipp Freimoser, Mario Kunstek and Jülg Dominik
1007 Follow-up Question Generation using Pattern-based Seq2seq with a Small Corpus for Interview Coaching Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong and Huai-Hung Huang
1010 Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search Yougen Yuan, Cheung-Chi Leung, lei xie, Hongjie Chen, Bin Ma and Haizhou Li
1013 Capsule Networks for Low Resource Spoken Language Understanding Vincent Renkens and Hugo Van hamme
1015 Learning Discriminative Features for Speaker Identification and Verification Sarthak Yadav and Atul Rai
1016 LSTM based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language Laxmi Pandey and Karan Nathwani
1018 Detection of glottal closure instants in degraded speech using single frequency filtering analysis Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana
1019 Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis Simone Hantke, Christoph Stemp and Björn Schuller
1020 Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao
1021 Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR Yerbolat Khassanov and Eng Siong Chng
1023 MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks Wenhao Ding and Liang HE
1024 Effective acoustic cue learning is not just statistical, it is discriminative Jessie S. Nixon
1025 Compression of End-to-End Models Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
1026 Postfiltering with Complex Spectral Correlations for Speech and Audio Coding sneha das and Tom Bäckström
1027 Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding sneha das and Tom Bäckström
1030 Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
1032 Discriminating between nasals and approximants in English language using zero time windowing RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
1034 Scalable Factorized Hierarchical Variational Autoencoder Training Wei-Ning Hsu and James Glass
1035 Contextual Slot Carryover for Disparate Schemas Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert and Ruhi Sarikaya
1037 Stream Attention for Distributed Multi-Microphone Speech Recognition Xiaofei Wang, Ruizhi Li and Hynek Hermansky
1038 Articulatory consequences of vocal effort elicitation method Elisabet Eir Cortes, Marcin Wlodarczak and Juraj Šimko
1039 Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding Yujiang Li, Xuemin Zhao, Weiqun Xu and Yonghong Yan
1042 Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis Yuanjun Zhao, Roberto Togneri and Victor Sreeram
1043 Designing a Pneumatic Bionic Voice Prosthesis - Statistical Approach for Source Excitation Generation Farzaneh Ahmadi and Tomoki Toda
1044 Training Utterance-level Embedding Networks for Speaker Identification and Verification Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim and Jonghun Park
1046 Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement Ching-Hua Lee, Bhaskar D. Rao and Harinath Garudadri
1047 Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions Okko Räsänen, Seshadri Shreyas and Marisa Casillas
1049 Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning ShiLiang Zhang and Ming Lei
1054 Towards a better characterization of Parkinsonian speech: a multidimensional acoustic study Veronique Delvaux, kathy Huet, Myriam Piccaluga, Sophie Van Malderen and Bernard Harmegnies
1055 Low-Latency Neural Speech Translation Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel
1057 Structured Word Embedding for Low Memory Neural Network Language Model Kaiyu Shi and Kai Yu
1058 An End-to-End Text-Independent Speaker Identification System on Short Utterances Ruifang Ji, Xinyuan Cai and Xu Bo
1059 Dysarthric speech classification using glottal features computed from non-words, words and sentences Narendra N P and Paavo Alku
1060 Length contrast and covarying features: Whistled speech as a case study Rachid Ridouane, Giuseppina Turco and Julien Meyer
1062 On the Usefulness of the Speech Phase Spectrum for Pitch Extraction Erfan Loweimi, Jon Barker and Thomas Hain
1063 Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription Rongfeng Su, Xunying Liu and Lan Wang
1065 Regional variation of /r/ in Swiss German dialects Adrian Leemann, Stephan Schmid, Dieter Studer-Joho and Marie-José Kolly
1070 i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models Karel Beneš, Santosh Kesiraju and Lukáš Burget
1074 Structural effects on properties of consonantal gestures in Tashlhiyt Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane
1076 General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats Gábor Gosztolya, Tamás Grósz and László Tóth
1078 Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
1079 Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi and Ildikó Hoffmann
1080 Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model Keisuke Tanihara, Shogo Yonekura and Yasuo Kuniyoshi
1081 Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling Siyuan Feng and Tan Lee
1085 Automatic Speech Recognition System Development in the "Wild" Anton Ragni and Mark Gales
1086 Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu
1087 A deep learning approach to assessing non-native pronunciation of English using phone distances Konstantinos Kyriakopoulos, Kate Knill and Mark Gales
1088 The Conversation Continues: The Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition Odette Scharenborg and Martha Larson
1089 Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin
1093 The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
1096 Punctuation Prediction Model for Conversational Speech Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak
1097 Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition Wei-Ning Hsu, Hao Tang and James Glass
1098 Detecting Packet-Loss Concealment Using Formant Features and Decision Tree Learning Gabriel Mittag and Sebastian Möller
1099 The Role of Cognate Words, POS Tags, and Entrainment in Code-Switching Victor Soto, Nishi Cestero and Julia Hirschberg
1100 Play Duration based User-Entity Affinity Modeling in Spoken Dialog System Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan and Abishek Ravi
1102 Analysis of Complementary Information Sources in the Speaker Embeddings Framework Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
1103 Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
1105 Estimation of the Vocal Tract Length of Vowel Sounds based on the Frequency of the Significant Spectral Valley TV Ananthapadmanabha and Ramakrishnan AngaraiGanesan
1107 Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese Shiyu Zhou, Dong Linhao, Shuang Xu and Bo Xu
1108 Tongue Segmentation with Geometrically Constrained Snake Model Zhihua Su, Jianguo Wei, Qiang Fang, Jianrong Wang and Kiyoshi Honda
1110 L2-ARCTIC: a non-native English speech corpus Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna
1111 Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition Yike Zhang, pengyuan zhang and Yonghong Yan
1113 Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder Kei Akuzawa, Yusuke Iwasawa and Yutaka Matsuo
1114 A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu and Benoit Champagne
1115 A comparison of input types to a deep neural network-based forced aligner Matthew C. Kelley and Benjamin V. Tucker
1120 Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das
1121 Voice Conversion with Conditional SampleRNN Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco and Dan Darcy
1122 Contextual Language Model Adaptation for Conversational Agents Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh and Ariya Rastrow
1124 Improved ASR for under-resourced languages through Multi-task Learning with Acoustic Landmarks Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and Deming Chen
1125 Self-similarity matrix based intelligibility assessment of cleft lip and palate speech Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat
1126 Formant measures of vowels adjacent to alveolar and retroflex consonants in Arrernte: stressed and unstressed position Marija Tabain, Richard Beare and Andrew Butcher
1128 Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection Madhusudan Singh and Debadatta Pati
1130 Dialect-geographical Acoustic-Tonetics: five disyllabic tone sandhi patterns in cognate words from the Wu dialects of Zhèjiāng province Phil Rose
1131 A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder Berrak Sisman, Mingyang Zhang and Haizhou Li
1134 Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations Aaron Nicolson and Kuldip K. Paliwal
1135 Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector Sensor Disong Wang and Yuexian Zou
1138 Multi-modal attention mechanisms in LSTM and its application to acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1139 Rapid Collection of Spontaneous Speech Corpora using Telephonic Community Forums Agha Ali Raza, Awais Athar, Shan Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif and Roni Rosenfeld
1140 Monoaural Audio Source Separation using Variational Autoencoders Laxmi Pandey, Anurendra Kumar and Vinay Namboodiri
1143 Deep learning techniques for koala activity detection Ivan Himawan, Michael Towsey, Bradley Law and Paul Roe
1147 Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination Jindrich Matousek and Daniel Tihelka
1149 User Information Augmented Semantic Frame Parsing using Progressive Neural Networks Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin
1150 A Shifted Delta Coefficient Objective for Monaural Speech Separation using Multi-task Learning Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li
1151 Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection Youngmoon Jung, Younggwan Kim, Yeunju Choi and Hoirin Kim
1152 Temporal transformer networks for acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1153 State Gradients for RNN Memory Analysis Lyan Verwimp, Hugo Van hamme, Vincent Renkens and Patrick Wambacq
1154 Waveform-Based Speaker Representations for Speech Synthesis Moquan Wan, Gilles Degottex and Mark Gales
1156 Leveraging Second-Order Log-Linear model for improved deep learning based ASR performance Ankit Raj, Shakti Rath and Jithendra Vepa
1158 Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Dan Povey
1159 Word Emphasis Prediction for Expressive Text to Speech Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
1160 Forward-Backward Attention Decoder Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
1162 Active Learning for LF-MMI Trained Neural Networks in ASR Yanhua Long, Hong Ye, Yijie Li and Jiaen Liang
1165 Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal Lukas Mateju, Petr Cerva, Jindrich Zdansky and Radek Safarik
1171 Homophone Identification and Merging for Code-switched Speech Recognition Brij Mohan Lal Srivastava and Sunayana Sitaram
1173 Improved Epoch Extraction from Telephonic Speech using Chebfun and Zero Frequency Filtering Ganga Gowri B, Soman K.P and Govind D
1174 Using pupillometry to measure the cognitive load of synthetic speech Avashna Govender and Simon King
1176 Resyllabification in Indian Languages and its Implications in Text-to-speech Systems Mahesh M, Jeena JPrakash and Hema Murthy
1178 Code-switching in Indic Speech Synthesisers Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy
1182 Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer Siyuan Feng and Tan Lee
1185 GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan
1188 Transcription correction for Indian languages using acoustic signatures Jeena JPrakash, Golda Brunet Rajan and Hema Murthy
1190 WaveNet Vocoder with Limited Training Data for Voice Conversion Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai
1198 Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai
1199 Measuring the cognitive load of synthetic speech using a dual task paradigm Avashna Govender and Simon King
1202 Phoneme-to-Articulatory mapping using bidirectional gated RNN Théo Biasutto--Lervat and Slim Ouni
1203 Information Bottleneck based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu and Hema Murthy
1204 Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao
1205 Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
1209 Triplet loss based cosine similarity metric learning for text-independent speaker recognition Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov and Ivan Kremnev
1210 Collapsed speech segment detection and suppression for WaveNet vocoder YICHIAO WU, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda
1211 Data augmentation improves recognition of foreign accented speech Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
1212 Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney
1214 Exploration of Local Speaking Rate Variations in Mandarin Read Speech Guan-Ting Liou, Chen-Yu CHIANG, Yih-Ru Wang and Sin-Horng Chen
1222 An Active Feature Transformation Method For Attitude Recognition of Video Bloggers Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz
1223 A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey and Deliang Wang
1224 Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions Rong Gong and Xavier Serra
1225 Vowels and Diphthongs in Hangzhou Wu Chinese Dialect Yang Yue and Fang Hu
1226 Speaker Embedding Extraction with Phonetic Information Yi Liu, Liang He, Jia Liu and Michael T. Johnson
1227 Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa
1230 Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
1232 S4D: Speaker Diarization Toolkit in Python Pierre-Alexandre Broux, Florent Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive and Sylvain Meignier
1233 Age-related effects on sensorimotor control of speech production Anne Hermes, Jane Mertens and Doris Mücke
1234 Single-channel Speech Dereverberation via Generative Adversarial Training Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu
1237 Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems Deepak Baby and Sarah Verhulst
1238 Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant? Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
1239 Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato and Shogo Minagi
1240 On Learning to Identify Genders from Raw Speech Signal using CNNs Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai Doss
1241 Neural Language Codes for Multilingual Acoustic Models Markus Müller, Sebastian Stüker and Alex Waibel
1242 An Attention Pooling based Representation Learning Method for Speech Emotion Recognition Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
1243 Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition Liwen Zhang
1244 Learning to adapt: a meta-learning approach for speaker adaptation Ondrej Klejch, Joachim Fainberg, Peter Bell and Steve Renals
1245 Weighting Pitch Contour and Loudness Contour in Mandarin Tone Perception in Cochlear Implant Listeners Qinglin Meng, Nengheng Zheng, Ambika Prasad Mishra, Jacinta Dan Luo and Jan W. H. Schnupp
1246 Co-whitening of i-vectors for short and long duration speaker verification Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang
1247 Training Augmentation using Adversarial Examples for Robust Speech Recognition Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie
1248 Multiple Concurrent Sound Source Tracking Based on Observation-Guided Adaptive Particle Filter Hong Liu, haipeng lan, Bing Yang and Cheng Pang
1250 Data independent sequence augmentation method for acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1251 Pitch-Adaptive Front-end Feature for Hypernasality Detection Akhilesh Dubey, S R Mahadeva Prasanna and Samarendra Dandapat
1252 ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge Zbynek Zajic, Marie Kunesova, Jan Zelinka and Marek Hrúz
1254 A first investigation of the timing of turn-taking in Ruuli Tuarik Buanzur, Margaret Zellers, Saudah Namyalo and Alena Witzlack-Makarevich
1256 Exploring temporal reduction in dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s Ioana Vasilescu, Nidia Hernandez, Bianca Vieru and Lori Lamel
1258 Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors Kanru Hua
1259 A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model Sreeram Ganji and Rohit Sinha
1262 Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
1264 Perceptual and automatic evaluations of the intelligibility of speech degraded by noise induced hearing loss simulation Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier
1265 Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
1266 Automatic Evaluation of Speech Intelligibility based on i-vectors in the context of Head and Neck Cancers Imed Laaridh, Corinne Fredouille, Alain Ghio, muriel lalain and Virginie Woisard
1267 Automatic Pronunciation Evaluation of Singing Chitralekha Gupta, Haizhou Li and Ye Wang
1269 Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network Weipeng He, Petr Motlicek and Jean-Marc Odobez
1270 Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment Yujia Xiao, Frank Soong and Wenping Hu
1271 Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier
1272 Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
1280 A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions Luciana Ferrer and Mitchell McLaren
1281 Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng
1283 Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning Wenda Chen, Mark Hasegawa-Johnson and Nancy Chen
1284 Should code-switching models be asymmetric? Barbara Bullock, Wally Guzman, Jacqueline Serigos and Almeida Jacqueline Toribio
1285 Visual timing information in audiovisual speech perception: evidence from lexical tone contour Hui Xie, Biao Zeng and Rui Wang
1286 A Weighted Superposition of Functional Contours model for modelling contextual prominence of elementary prosodic contours Branislav Gerazov, gerard bailly and Yi Xu
1288 An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder Yun-Shao Lin, Susan Shur-Fen Gau and Chi-Chun Lee
1291 Multi-resolution gammachirp envelope distortion index for intelligibility prediction of noisy speech Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
1293 A Case Study on the Importance of Belief State Representation for Dialogue Policy Management Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou
1294 SPEECH ENHANCEMENT USING THE MINIMUM-PROBABILITY-OF-ERROR CRITERION Jishnu Sadasivan, Subhadip Mukherjee and Chandra Sekhar Seelamantula
1295 Learning Structured Dictionaries for Exemplar-based Voice Conversion Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna
1296 Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks Wolfgang Mack, Soumitro Chakrabarty, Fabian-Robert Stöter, Sebastian Braun, Bernd Edler and Emanuël Habets
1297 Exploration of Compressed ILPR Features for Replay Attack Detection Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha
1298 Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee
1299 A Compact and Discriminative Feature based on Auditory Summary Statistics for Acoustic Scene Classification Hongwei Song, Jiqing Han and Shiwen Deng
1301 Multi-channel Attention for End-to-End Speech Recognition Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu
1302 BUT system for low resource Indian language ASR Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiat, Lukas Burget and Jan Černocký
1305 Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
1306 Acoustic-dependent phonemic transcription for text-to-speech synthesis Kévin Vythelingum, Yannick Estève and Olivier Rosec
1308 Unsupervised Word Segmentation from Speech with Attention Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent Besacier
1309 Liulishuo's System for the Spoken CALL Shared Task 2018 Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang and Yang Liu
1310 Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events Gurunath Reddy M, K Sreenivasa Rao and Partha Pratim Das
1312 Impact of ASR Performance on Free Speaking Language Assessment Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
1313 A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis Kai-Zhan Lee, Erica Cooper and Julia Hirschberg
1316 Data requirements, selection and augmentation for DNN-based speech synthesis from crowdsourced data Markus Toman, Geoffrey Meltzner and Rupal Patel
1318 Semi-supervised learning for information extraction from dialogue Anjuli Kannan, Kai Chen, Alvin Rajkomar and Diana Jaunzeikare
1319 Anomaly Detection Approach for Pronunciation Verification of Disordered Speech using Speech Attribute Features Mostafa Shahin, Beena Ahmed, Jim Ji and Kirrie Ballard
1320 Prosodic Focus Acquisition in French Early Cochlear Implanted Children Chadi Farah, Stephane Roman and Mariapaola D'Imperio
1326 Low-Resource Speech-to-Text Translation Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater
1327 Stochastic Shake-Shake Regularization for Affective Learning from Speech Che-Wei Huang and Shrikanth Narayanan
1328 An Optimization Based Approach for Solving Spoken CALL Shared Task Mohammad Ateeq, Abualsoud Hanani and Aziz Qaroush
1331 Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge Claude Montacié and Marie-José Caraty
1333 Statistical Model Compression for Small-Footprint Natural Language Understanding Grant Strimel, Kanthashree Mysore Sathyendra and Stanislav Peshterliev
1336 Automatically measuring L2 speech fluency without the need of ASR: a proof-of-concept study with Japanese learners of French Lionel Fontan, Maxime Le Coz and Sylvain Detey
1339 A GPU-based WFST Decoder with Exact Lattice Generation Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur
1342 Adding New Classes Without Access to the Original Training Data with Applications to Language Identification Hagai Taitelbaum, Ehud Ben-Reuven and Jacob Goldberger
1343 Dual Language Models for Code Switched Speech Recognition Saurabh Garg, Tanmay Parekh and Preethi Jyothi
1345 Efficient language model adaptation with Noise Contrastive Estimation and Kullback-Leibler regularization Jesús Andrés-Ferrer, Nathan Bodenstab and Paul Vozila
1346 Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator Pei-Hung Chung, Kuan Tung, Ching-Lun Tai and Hung-yi Lee
1348 Classification of Correction Turns in Multilingual Dialogue Corpus Ivan Kraljevski and Diane Hirschfeld
1349 An Open Source Emotional Speech Corpus for Human Robot Interaction Applications Jesin James, Li Tian and Catherine Watson
1350 Investigating the role of L1 in automatic pronunciation evaluation of L2 speech Ming Tu, Anna Grabek, Julie Liss and Visar Berisha
1351 A Deep Learning Method for Pathological Voice Detection using Convolutional Deep Belief Networks Huiyi Wu, John Soraghan, Anja Lowit and Gaetano Di-Caterina
1352 User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning Máté Ákos Tündik, György Szaszák, Gábor Gosztolya and András Beke
1353 Emotion Identification from raw speech signals using DNNs Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma and Najim Dehak
1358 Impact of different speech types on listening effort Olympia Simantiraki, Martin Cooke and Simon King
1363 An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings using Recurrent Neural Networks Arindam Jati and Panayiotis Georgiou
1364 Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks Tae Jin Park and Panayiotis Georgiou
1369 Training Recurrent Neural Network through Moment Matching for NLP Applications Yue Deng, Yilin Shen, KaWai Chen and Hongxia Jin
1370 Filter sampling and combination CNN (FSC-CNN): a compact CNN model for small-footprint ASR acoustic modeling using raw waveforms Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu and Abeer Alwan
1371 Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models Yuan Gong and Christian Poellabauer
1372 The University of Birmingham 2018 Spoken CALL Shared Task Systems Mengjie Qian, Xizi Wei, Peter Jancovic and Martin Russell
1373 Cross-cultural (A)symmetries in Audio-visual Attitude Perception Hansjörg Mixdorff, Albert Rilliard, Tan Lee, Matthew K. H. Ma and Angelika Hönemann
1374 Prediction of Perceived Speech Quality Using Deep Machine Listening Jasper Ooster, Rainer Huber and Bernd T. Meyer
1375 Prediction of Subjective Listening Effort from Acoustic Data with Non-Intrusive Deep Models Paul Kranzusch, Rainer Huber, Melanie Krüger, Birger Kollmeier and Bernd T. Meyer
1376 Phone Recognition using a Non-Linear Manifold with Broad Phone Class Dependent DNNs Mengjie Qian, Linxue Bai, Peter Jancovic and Martin Russell
1377 Integrating Recurrence Dynamics for Speech Emotion Recognition Efthymios Tzinis, Georgios Paraskevopoulos, Christos Baziotis and Alexandros Potamianos
1378 Leveraging Native Language Information for Improved Accented Speech Recognition Shahram Ghorbani and John H.L. Hansen
1379 A New Deep Reinforcement Learning based Coaching Model (DCM) for Slot Filling in Spoken Language Understanding Yu Wang, Abhishek Patel, Yilin Shen and Hongxia Jin
1381 Sequence-to-sequence Neural Network model with 2D attention for learning Japanese pitch accents Antoine Bruguier, Heiga Zen and Arkady Arkhangorodsky
1384 Articulation Rate as a Speaker Discriminant in British English Erica Gold
1386 Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours Quy-Thao Truong, Tsuneo Kato and Seiichi Yamamoto
1387 Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz and Shrikanth Narayanan
1391 Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes Srinivas Parthasarathy and Carlos Busso
1392 Cold Fusion: Training Seq2Seq Models Together with Language Models Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh and Adam Coates
1395 Towards an Unsupervised Entrainment Distance in Conversational Speech using deep Neural Networks Md Nasir, Brian Baucom, Shrikanth Narayanan and Panayiotis Georgiou
1399 Effectiveness of Voice Quality Features in Detecting Depression Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint and Abeer Alwan
1400 The Conversation: Deep Audio-Visual Speech Enhancement Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
1401 Using Voice Quality Supervectors for Affect Identification Soo Jin Park, Amber Afshan and Abeer Alwan
1403 Output-Gate Projected Gated Recurrent Unit for Speech Recognition Gaofeng Cheng, Dan Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur and Yonghong Yan
1404 Gestural lenition of rhotics captures variation in Brazilian Portuguese Phil Howson and Alexei Kochetov
1405 A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement Ke Tan and DeLiang Wang
1406 A Two-Stage Approach to Noisy Cochannel Speech Separation with Gated Residual Networks Ke Tan and DeLiang Wang
1407 Twin Regularization for online speech recognition Mirco Ravanelli, Dmitriy Serdyuk and Yoshua Bengio
1413 Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition Ke Li, Hainan Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur
1417 Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks Dan Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
1419 A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment Megan Willi, Stephanie Borrie, Tyson Barrett, Ming Tu and Visar Berisha
1422 Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
1423 End-to-end speech recognition using lattice-free MMI Hossein Hadian, Hossein Sameti, Daniel Povey and Sanjeev Khudanpur
1424 Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
1425 Analyzing Effect of Physical Expression on English Proficiency for Multimodal Computer-Assisted Language Learning Haoran Wu, Yuya Chiba, Takashi Nose and Akinori Ito
1428 Long Distance Voice Channel Diagnosis Using Deep Neural Networks Zhen Qin, Tom Ko and Guangjian Tian
1430 Neural Error Corrective Language Models for Automatic Speech Recognition Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki and Yushi Aono
1431 Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy Debayan Ghosh, Muralishankar R and Sanjeev Gurugopinath
1432 Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function Jian Huang, Ya Li, Jianhua Tao and Zhen Lian
1436 Spoken Keyword Detection using joint DTW-CNN Ravi Shankar, Vikram C M and S R Mahadeva Prasanna
1438 Auxiliary feature based adaptation of end-to-end ASR systems Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
1439 Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement Li Chai, Jun Du and Chin-Hui Lee
1442 Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers Kohei Hara, Koji Inoue, Katsuya Takanashi and Tatsuya Kawahara
1446 compensation for domain mismatch in text-independent speaker recognition Fahimeh Bahmaninezhad and John H.L. Hansen
1447 Iterative Learning of Speech Recognition Models for Air Traffic Control Ajay Srinivasamurthy, Petr Motlicek, Mittul Singh, Youssef Oualil, Matthias Kleinert, Heiko Ehr and Hartmut Helmke
1450 Improving DNNs Trained With Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation Amit Das and Mark Hasegawa-Johnson
1452 A Multistage Training Framework For Acoustic-to-Word Model Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui and Dong Yu
1453 Acoustic modeling from frequency domain representations of speech Pegah Ghahremani, Hossein Hadian, Hang Lv, Dan Povey and Sanjeev Khudanpur
1454 The Voices Obscured in Complex Environmental Settings (VOICES) Corpus Colleen Richey, Maria Alejandra Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Garciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
1455 Encoding Individual Acoustic Features using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition Jeng-Lin Li and Chi-Chun Lee
1456 ESPnet: End-to-End Speech Processing Toolkit Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
1457 The retroflex-dental contrast in Punjabi stops and nasals: A principal component analysis of ultrasound images Alexei Kochetov, Matthew Faytak and Kiranpreet Nara
1459 Fast Derivation of Cross-lingual Document Vectors from Self-attentive Neural Machine Translation Model Wei Li and Brian Mak
1460 DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion Kentaro Sone and Toru Nakashika
1462 Improving Gender Identification in Movie Audio using Cross-Domain Data Rajat Hebbar, Krishna Somandepalli and Shrikanth Narayanan
1463 Assessing Speaker Engagement in 2-person Debates: Overlap Detection in United States Presidential Debates Midia Yousefi, Navid Shokouhi and John H.L. Hansen
1465 Fusing text-dependent word-level i-Vector models to screen ‘at risk’ child speech Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell and John H.L. Hansen
1471 Testing paradigms for assistive hearing devices in diverse acoustic environments RAM CHARAN CHANDRA SHEKAR, HUSSNAIN ALI and John H.L. Hansen
1472 BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in A Text-to-Speech Front-End Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ya Li
1473 Detection of Replay-Spoofing Attacks using Frequency Modulation Features Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah and Julien Epps
1475 Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
1476 Homogeneity vs heterogeneity in Indian English prosody: Investigating influences of L1 on f0 range Olga Maxwell, Elinor Payne and Rosey Billington
1477 Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition Ziping Zhao, Yu Zheng, Zixing Zhang, Haishui Wang, Yiqin Zhao and Chao Li
1481 ASe: Acoustic scene embedding using Deep archetypal analysis and GMM Pulkit Sharma, Vinayak Abrol and Anshul Thakur
1484 Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios Hao Zhang and DeLiang Wang
1485 Layer Trajectory LSTM Jinyu Li, Changliang Liu and Yifan Gong
1486 Densely Connected Networks for Conversational Speech Recognition Kyu Han, Akshay Chandrashekaran, Jungsuk Kim and Ian Lane
1487 Whispered speech to neutral speech conversion using bidirectional LSTMs G. Nisha Meenakshi and Prasanta Ghosh
1494 Decision-level feature switching as a paradigm for replay attack detection Saranya M S and Hema Murthy
1499 Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion Berrak Sisman and Haizhou Li
1500 Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification Ziqiang Shi, Liu Liu, Huibin Lin and Rujie Liu
1504 Voice Conversion Across Arbitrary Speakers based on a Single Target-Speaker Utterance Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng
1506 Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions Yu Gu and Yongguo Kang
1509 Noise robust acoustic to articulatory speech inversion Nadee Seneviratne, Ganesh Sivaraman, Vikramjit Mitra and Carol Espy-Wilson
1511 EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System Hao Li, Yongguo Kang and Zhenyu Wang
1514 Detection of dementia from responses to atypical questions asked by embodied conversational agents Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
1515 An Improved Deep Embedding Learning Method for Short Duration Speaker Verification Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
1516 A Knowledge Driven Structural Segmentation Approach for Play-Talk Classification during Autism Assessment Manoj Kumar, Pooja Chebolu, So Hyun Kim, Kassandra Martinez, Catherine Lord and Shrikanth Narayanan
1518 Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed and Shrikanth Narayanan
1519 Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
1521 Acoustic Features associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders Wang Zhang, Xiangqian Gui, Tianqi Wang, Feng Yang, Lan Wang, Manwa Ng and Nan Yan
1522 All-Conv Net for Bird Activity Detection: Significance of Learned Pooling Arjun Pankajakshan, Anshul Thakur, Daksh Thapar, Padmanabhan Rajan and Aditya Nigam
1523 Automatic Assessment of Individual Culture Attribute of Power Distance using a Social Context-Enhanced Prosodic Network Representation Fu-Sheng Tsai, Hao-Chun Yang, Wei-Wen Chang and Chi-Chun Lee
1524 Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling Hangting Chen, Pengyuan Zhang, Haichuan Bai, Qingsheng Yuan, Xiuguo Bao and Yonghong Yan
1526 Keyword based speaker localization: Localizing a target speaker in a multi-speaker environment Sunit Sivasankaran, Emmanuel Vincent and Dominique Fohr
1528 High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder Kuan Chen, Bo Chen, Jiahao Lai and Kai Yu
1529 Information Structure, Affect, and Prenuclear Prominence in American English Eleanor Chodroff and Jennifer Cole
1531 Device-directed Utterance Detection Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas and Bjorn Hoffmeister
1535 A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong and Richard Socher
1536 Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification Hardik Sailor and Hemant Patil
1538 Effectiveness of Dynamic Features in INCA and Temporal Context-INCA Nirmesh Shah and Hemant Patil
1541 Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus Jianwei Yu, Xurong Xie, shoukang hu, SHANSONG LIU, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
1542 The Zurich Corpus of Vowel and Voice Quality, Version 1.0 Dieter Maurer, Christian d'Heureuse, Heidy Suter, Volker Dellwo, Daniel Friedrichs and Thayabaran Kathiresan
1543 Compressing End-to-end ASR Networks by Tensor-Train Decomposition Takuma Mori, Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
1544 Gated Recurrent Unit Based Acoustic Modeling with Future Context Jie Li, Xiaorui Wang, Yuanyuan Zhao and Yan Li
1545 Angular Softmax for Short-Duration Text-independent Speaker Verification Zili Huang, Shuai Wang and Kai Yu
1547 Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks Xuankai Chang, Yanmin Qian and Dong Yu
1552 Temporal attentive pooling for acoustic event detection Xugang Lu, peng shen, Sheng Li, Yu Tsao and Hisashi Kawai
1553 DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018 Hardik Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur Patil, Madhu Kamble and Hemant Patil
1555 Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Language Savitha Murthy, Dinkar Sitaram and Sunayana Sitaram
1556 Pitch Characteristics of L2 English Speech by Chinese Speakers: A Large-scale Study Jiahong Yuan, Qiusi Dong, Fei Wu, Huan Luan, Xiaofei Yang, Hui Lin and Yang Liu
1558 Machine Speech Chain with One-shot Speaker Adaptation Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
1559 Fast implementation of Elastic Recurrence Analysis for the analysis of speech signals Leonardo Lancia
1561 Incremental TTS for Japanese Language Tomoya Yanagita, Sakriani Sakti and Satoshi Nakamura
1562 Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions Sandeep Nallan Chakravarthula, Panayiotis Georgiou and Brian Baucom
1563 Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models Masayuki Suzuki, Tohru Nagano, Gakuto Kurata and Samuel Thomas
1565 Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion Neil Shah, Nirmesh Shah and Hemant Patil
1568 Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study Siddique Latif, Rajib Rana, Junaid Qadir and Julien Epps
1570 Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Ghosh
1574 Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection Buddhi Wickramasinghe, saad irtza, Eliathamby Ambikairajah and Julien Epps
1575 Korean Singing Voice Synthesis based on LSTM Recurrent Neural Network Juntae Kim, Heejin Choi, Jinuk Park, Minsoo Hahn, Sangjin Kim and Jong-Jin Kim
1580 Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring Raghav Menon, Herman Kamper, John Quinn and Thomas Niesler
1581 Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates Joo-Kyung Kim and Young-Bum Kim
1583 Computational modeling of conversational humor in psychotherapy Anil Ramakrishna, Timothy Greer, David Atkins and Shrikanth Narayanan
1584 An Exploration Towards Joint Acoustic Modeling for Indian Languages: IIIT-H submission for Low Resource Speech Recognition Challenge for Indian languages, INTERSPEECH 2018 Hari Krishna, Krishna Gurugubelli, vishnu vidyadhara raju v and Anil Kumar Vuppala
1589 Knowledge Distillation for Sequence Model Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian and Kai Yu
1590 A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis systems Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim and Hong-Goo Kang
1593 Cosine Metric Learning for Speaker Verification in the i-vector Space Zhongxin Bai, Xiao-Lei Zhang and Jingdong Chen
1597 Investigation on the combination of batch normalization and dropout in BLSTM-based acoustic modeling for ASR Li Wenjie, Gaofeng Cheng, Fengpei Ge, pengyuan zhang and Yonghong Yan
1598 Acoustic Modeling using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim and Eunwoo Song
1600 A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse Preeti Rao, Mugdha Pandya, Kamini Sabu, Kanhaiya Kumar and Nandini Bondale
1602 Stress Distribution of Given information in Chinese Reading Texts Yuan Jia and Xiaoxiao Ma
1603 Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
1606 Speaker Activity Detection and Minimum Variance Beamforming for Source Separation Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev and Shih-Chii Liu
1608 Avoiding Speaker Overfitting in End-to-End DNNs using Raw Waveform for Text-Independent Speaker Verification Jee-weon Jung, Hee-soo Heo, IL-ho Yang, Hye-jin Shim and Ha-jin Yu
1610 Attention-based sequence classification for affect detection John Kane, Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joesph Moran and Ali Azarbayejani
1612 Correlational Networks for Speaker Normalization in Automatic Speech Recognition Rini A Sharon, Sandeep Reddy Kothinti and Umesh Srinivasan
1613 Epoch Extraction from Pathological Children Speech Using Single Pole Filtering Approach Vikram C M and S R Mahadeva Prasanna
1615 Sparsity-Constrained Weight Mapping for Individualization of Head-Related Transfer Functions from Anthropometric Features Xiaoke Qi and Jianhua Tao
1616 Improved training of end-to-end attention models for speech recognition Albert Zeyer, Kazuki Irie, Ralf Schlüter and Hermann Ney
1625 Transfer Learning for Improving Speech Emotion Classification Accuracy Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir and Julien Epps
1626 Multilingual Grapheme-to-Phoneme Conversion with Global Character Vectors Jinfu Ni, Yoshinori Shiga and Hisashi Kawai
1629 End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang and John Hershey
1630 Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning Ying Qin, Tan Lee, Siyuan Feng and Anthony Pak Hin Kong
1631 Estimation of Hypernasality Scores from Cleft Lip and Palate Speech Vikram C M, Ayush Tripathi, Sishir Kalita and S R Mahadeva Prasanna
1635 Speaker-independent raw waveform model for glottal excitation Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
1637 Time Aggregation Operators for Multi-label Audio Event Detection Pankaj Joshi, Digvijaysingh Gautam, Ganesh Ramakrishnan and Preethi Jyothi
1638 Pitch or phonation: On the glottalization in tone productions in the Ruokeng Hui Chinese dialect Minghui Zhang and Fang Hu
1644 Automatic Miscue Detection using RNN Based Models with Data Augmentation Yoon Seok Hong, Kyung Seo Ki and gahgene gweon
1646 Processing Transition Regions of Glottal Stop substituted /s/ for Intelligibility Enhancement of Cleft Palate Speech Protima Nomo Sudro, Sishir Kalita and S R Mahadeva Prasanna
1649 The individual and the system: assessing the stability of the output of a semi-automatic forensic voice comparison system Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh and Eugenia San Segundo Fernández
1650 Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation Yun Liu, Hui Zhang and Xueliang Zhang
1651 Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection Hardik Sailor, Madhu Kamble and Hemant Patil
1652 Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks Zhong-Qiu Wang, Xueliang Zhang and DeLiang Wang
1654 Combined Speaker Clustering and Role Recognition in Conversational Speech Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan
1655 Multi-Head Decoder for End-to-End Speech Recognition Tomoki Hayashi, Shinji Watanabe, Tomoki Toda and Kazuya TAKEDA
1660 Single-channel late reverberation power spectral density estimation using denoising autoencoders Ina Kodrasi and Hervé Bourlard
1661 Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection Prasad Tapkir and Hemant Patil
1662 Exemplar-Based Spectral Detail Compensation for Voice Conversion Yu-Huai Peng, Hsin-Te Hwang, YICHIAO WU, Yu Tsao and Hsin-Min Wang
1664 All-Neural Multi-Channel Speech Enhancement Zhong-Qiu Wang and DeLiang Wang
1665 Detection of Glottal Activity Errors in Production of Stop Consonants in Children with Cleft Lip and Palate Vikram C M, S R Mahadeva Prasanna, Ajish K Abraham, Pushpavathi M and Girish K S
1668 Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
1671 Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks Akihiro Kato and Tomi Kinnunen
1675 Effectiveness of Speech Demodulation-Based Features for Replay Detection Madhu Kamble, Hemlata Tak and Hemant Patil
1676 Analyzing EEG signals in auditory speech comprehension using Temporal Response Functions and Generalized Additive Models Kimberley Mulder, Louis ten Bosch and Lou Boves
1677 Weighting of Coda Voicing Cues: Glottalisation and Vowel Duration Joshua Penney, Felicity Cox and Anita Szakay
1679 Comparison of an end-to-end trainable dialogue system with a modular statistical dialogue system Norbert Braunschweiler and Alexandros Papangelis
1680 I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification Jiacen Zhang, Nakamasa Inoue and Koichi Shinoda
1685 Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization Nam Le and Jean-Marc Odobez
1687 Novel Variable Length Energy Separation Algorithm using Instantaneous Amplitude Features For Replay Detection Madhu Kamble and Hemant Patil
1688 Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta and Patrick Kenny
1692 Acoustic-Prosodic Features of Tabla Bol Recitation and Correspondence with the Tabla Imitation Rohit M A and Preeti Rao
1693 Feature with complementarity of statistics and principal information for spoofing detection Jichen Yang, Changhuai You and Qianhua He
1694 A Hybrid Approach to Grapheme to Phoneme Conversion in Assamese Somnath Roy and Shakuntala Mahanta
1696 On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs Hannah Muckenhirn, Mathew Magimai Doss and Sebastien Marcel
1702 Novel Linear Frequency Residual Cepstral Features For Replay Attack Detection Hemlata Tak and Hemant Patil
1705 Deep Convex Representations: Feature Representations for Bioacoustics Classification Anshul Thakur, Vinayak Abrol, Pulkit Sharma and Padmanabhan Rajan
1706 Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang and Yonghe Wang
1707 Visualizing Phoneme Category Adaptation in Deep Neural Networks Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson and Najim Dehak
1711 Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech Astik Biswas, Febe De Wet, Ewald Van der westhuizen, Emre Yilmaz and Thomas Niesler
1712 Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion Nirmesh Shah, Maulik Madhavi and Hemant Patil
1713 Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data Tifani Warnita, Nakamasa Inoue and Koichi Shinoda
1714 Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension Chia-Hsuan Li, Szu-Lin Wu, Chi-Liang Liu and Hung-yi Lee
1721 Weighting Time-Frequency Representation of Speech using Auditory Saliency for Automatic Speech Recognition Cong-Thanh Do and Yannis Stylianou
1722 CNN based Query by Example Spoken Term Detection Dhananjay Ram, Lesly Miculicich Werlen and Herve Bourlard
1725 The role of temporal variation in narrative organization Nassima Fezza
1727 Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks Iksoo Choi, Jinhwan Park and Wonyong Sung
1728 Analyzing Reaction Time Sequences from Human Participants in Auditory Experiments Louis ten Bosch, Mirjam Ernestus and Lou Boves
1730 Speech enhancement using deep mixture of experts based on hard expectation maximization Pavan Karjol and Prasanta Ghosh
1732 Speech Source Separation using ICA in Constant Q Transform Domain Dheeraj Sai D.V.L.N, Kishor K.S and Sri Rama Murty Kodukula
1736 UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Steve Renals, Korin Richmond, Zoe Roxburgh, James M Scobbie and Alan Wrench
1739 Multi-talker speech separation based on permutation invariant training and beamforming Lu Yin, Ziteng Wang, Risheng Xia, Junfeng Li and Yonghong Yan
1742 Speaker Diarization with Enhancing Speech for The First DIHARD Challenge Lei Sun, Jun Du, Chao Jiang, Xueyang Zhang, Shan He, Bing Yin and Chin-Hui Lee
1743 Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions Zhaocheng Huang, Julien Epps, Dale Joachim and Michael Chen
1744 Imbalance Learning-based Framework for Fear Recognition in the MediaEval Emotional Impact of Movies Task Xiaotong Zhang, Xingliang Cheng, Mingxing Xu and Thomas Fang Zheng
1746 Semi-Supervised End-to-End Speech Recognition Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix
1748 Multimodal Name Recognition in Live TV Subtitling Marek Hrúz, Aleš Pražák and Michal Bušta
1749 BUT system for DIHARD Speech Diarization Challenge 2018 Mireia Diez, Federico Landini, Lukas Burget, Johan Rohdin, Anna Silnova, Katerina Zmolikova, Ondřej Novotný, Karel Vesely, Ondrej Glembek, Oldrich Plchot, Ladislav Mošner and Pavel Matejka
1750 Neural speech turn segmentation and affinity propagation for speaker diarization Ruiqing Yin, Hervé Bredin and Claude Barras
1751 Data Augmentation using Healthy Speech for Dysarthric Speech Recognition Bhavik Vachhani, Chitralekha Bhat and Sunil Kumar Kopparapu
1753 LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory Toru Nakashika
1754 Dysarthric Speech Recognition using Time-delay Neural Network based Denoising Autoencoder Chitralekha Bhat, Bhavik Vachhani, Biswajit Das and Sunil Kumar Kopparapu
1755 Automatic question detection from acoustic and phonetic features using feature-wise pre-training Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa and Yushi Aono
1756 Automated Classification of Vowel-Gesture Parameters using External Broadband Excitation Balamurali B T and Jer-Ming Chen
1757 A New Glottal Neural Vocoder for Speech Synthesis Yang Cui, Xi Wang, Lei He and Frank F. Soong
1759 On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification Rajath Kumar, Vaishnavi Yeruva and Sriram Ganapathy
1760 Picture naming or word reading: does the modality affect speech motor adaptation and its transfer? Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz and Amelie Rochet-Capellan
1764 Detecting signs of dementia using word vector representations Bahman Mirheidari, Daniel Blackburn, Traci Walker, Markus Reuber, Annalena Venneri and Heidi Christensen
1766 Investigation on Estimation of Sentence Probability By Combining Forward, Backward and Bi-directional LSTM-RNNs Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter and Hermann Ney
1768 The fifth `CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal
1769 Deep Discriminative Embeddings for Duration Robust Speaker Verification Na Li, Deyi Tuo, Dan Su, Zhifeng Li and Dong Yu
1772 Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
1773 PhaseNet: Discretized phase modeling with deep neural networks for audio source separation Naoya Takahashi, Purvi Agrawal, Nabarun Goswami and Yuki Mitsufuji
1776 Empirical analysis of score fusion application to combined neural networks for open vocabulary spoken term detection Shi-wook Lee, Kazuyo Tanaka and Yoshiaki Itoh
1777 Attention-based End-to-End Models for Small-Footprint Keyword Spotting Changhao Shan, Junbo Zhang, Yujun Wang and Lei Xie
1778 Cross-lingual Speech Emotion Recognition through Factor Analysis Brecht Desplanques and Kris Demuynck
1780 Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
1788 Siamese Recurrent Auto-encoder Representation for Query-by-Example Spoken Term Detection Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng and Lianhong Cai
1791 Multimodal speech synthesis architecture for unsupervised speaker adaptation Hieu-Thi Luong and Junichi Yamagishi
1795 Cultural differences in pattern matching: multisensory recognition of socio-affective prosody takaaki shochi, marine guerry, Jean-Luc Rouas and donna erickson
1797 Hierarchical Recurrent Neural Networks for Acoustic Modeling Jinhwan Park, Iksoo Choi, Yoonho Boo and Wonyong Sung
1798 Characterizing rhythm differences between strong and weak accented L2 speech Chris Davis and Jeesun Kim
1800 Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings Da-Rong Liu, Kuan-yu Chen, Hung-yi Lee and Lin-shan Lee
1802 Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM Szu-wei Fu, Yu Tsao, Hsin-Te Hwang and Hsin-Min Wang
1804 Employing Phonetic Information in DNN Speaker Embeddings to Improve Speaker Recognition Performance Md Hafizur Rahman, Ivan Himawan, Mitchell McLaren, Clinton Fookes and Sridha Sridharan
1805 Sub-band Envelope Features using Frequency Domain Linear Prediction for Short Duration Language Identification Sarith Fernando, Vidhyasaharan Sethu and Eliathamby Ambikairajah
1806 Mining multimodal repositories for speech affecting diseases Joana Correia, Bhiksha Raj, Isabel Trancoso and Francisco Teixeira
1807 Who said that? A comparative study of non-negative matrix factorization techniques Teun Krikke, Frank Broz and David Lane
1808 Slot Filling with Delexicalized Sentence Generation Youhyun Shin, Kang Min Yoo and Sang-goo Lee
1811 Speech Emotion Recognition Using Spectrogram & Phoneme Embedding Promod Yenigalla, Abhay Kumar, Suraj Tripathi, Chirag Singh, Sibsambhu Kar and Jithendra Vepa
1812 Investigating the Effect of Face and Voice Familiarity in Recognising Speech in Noise Jeesun Kim, Sonya Karisma, Vincent Aubanel and Chris Davis
1819 Deep Siamese Architecture based Replay Detection for Secure Voice Biometrics Kaavya Sriskandaraja, Vidhyasaharan Sethu and Eliathamby Ambikairajah
1820 A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech Xingfeng Li and Masato Akagi
1821 Early detection of continuous and partial audio events using CNN Ian McLoughlin, Yan Song, Pham Dang Lam, Ramaswamy Palaniappan, Huy Phan and Yue Lang
1823 Gaussian Process Neural Networks for Speech Recognition Max W. Y. Lam, shoukang hu, Xurong Xie, SHANSONG LIU, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
1825 Measuring the Band Importance Function for Mandarin Chinese with an Bayesian Adaptive Procedure Yufan Du, Yi Shen, Hongying Yang, Xihong Wu and Jing Chen
1827 Interaction mechanisms between glottal source and vocal tract in pitch glides Tiina Murtola and Jarmo Malinen
1828 Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition ishwar chandra yadav, Avinash Kumar, Syed Shahnawazuddin and Gayadhar Pradhan
1830 Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee and Lin-shan Lee
1832 Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition Danqing Luo, Yuexian Zou and Dongyan Huang
1834 A Non-convolutive NMF Model for Speech Dereverberation Nikhil M, Rajbabu Velmurugan and Preeti Rao
1836 Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Sanjeev Khudanpur and Najim Dehak
1840 Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization Paul Magron and Tuomas Virtanen
1841 Estimation of the number of speakers with Variational Bayesian PLDA in the DIHARD diarization Challenge. Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel and Eduardo Lleida Solano
1843 Low resource acoustic-to-articulatory inversion using bi-directional long short term memory Aravind Illa and Prasanta Ghosh
1845 Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation Paul Magron, Konstantinos Drossos, Stylianos Mimilakis and Tuomas Virtanen
1846 Modulation Dynamic Features for the Detection of Replay Attacks Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake and Eliathamby Ambikairajah
1849 A Preliminary Study on Tonal Coarticulation in Continuous Speech Lixia Hao, Wei Zhang, Yanlu Xie and Jinsong Zhang
1851 What Do Classifiers Actually Learn? A Case Study on Emotion Recognition Datasets Patrick Meyer, Eric Buschermöhle and Tim Fingscheidt
1857 Exemplar-based speech waveform generation Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic and Simon King
1858 Towards Temporal Modelling of Categorical Speech Emotion Recognition Wenjing Han, Huabin Ruan, Xiaomin Chen, Zhixiang Wang, Haifeng Li and Björn Schuller
1860 A Study of Objective Measurement of Comprehensibility Through Native Speakers' Shadowing of Learners' Utterances Yusuke Inoue, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura and Yutaka Yamauchi
1862 Relating articulatory motions in different speaking rates Astha Singh, G. Nisha Meenakshi and Prasanta Ghosh
1864 Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning Abhinav Jain, Minali Upreti and Preethi Jyothi
1866 Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
1869 An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages Pallavi Baljekar, SaiKrishna Rallabandi and Alan W Black
1872 Learning Spontaneity to Improve Emotion Recognition in Speech Karttikeya Mangalam and Tanaya Guha
1873 Prominence-based evaluation of L2 prosody Heini Kallio, Antti Suni, Päivi Virkkunen and Juraj Šimko
1878 Automatic detection of multi-speaker fragments with high time resolution Evdokia Kazimirova and Andrey Belyaev
1883 On Enhancing Speech Emotion Recognition using Generative Adversarial Networks Saurabh Sahu, Rahul Gupta and Carol Espy-Wilson
1888 End-to-End Speech Command Recognition with Capsule Network Jaesung Bae and Dae-Shik Kim
1891 Multilingual Deep Neural Network Training using Cyclical Learning Rate Andreas Søeborg Kirkedal and Yeon-Jun Kim
1893 Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesus Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
1896 Information encoding by deep neural networks: what can we learn? Louis ten Bosch and Lou Boves
1897 Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model Ke Wang, Junbo Zhang, Yujun Wang and Lei Xie
1898 Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition Titouan parcollet, Ying Zhang, Chiheb Trabelsi, Mohamed Morchid, Renato de Mori, Georges Linares and Yoshua Bengio
1899 Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna and Ratree Wayland
1904 A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation Bastian Schnell and Philip N. Garner
1905 Robust Acoustic Event Classification using Bag-of-Visual-Words Manjunath Mulimani and Shashidhar G Koolagudi
1907 Reconstructing neutral speech from prosthetic esophageal speech Abinay Reddy Naini, Achuth Rao MV, G. Nisha Meenakshi and Prasanta Ghosh
1908 Revealing Spatiotemporal Brain Dynamics of Speech Production Based on EEG and Eye Movement Bin Zhao, Jinfeng Huang, Jianwu Dang, Gaoyan Zhang, Minbo Chen, Yingjian Fu and Longbiao Wang
1909 A Deep Identity Representation for Noise Robust Spoofing Detection Alejandro Gómez Alanís, Antonio M. Peinado, Jose A. Gonzalez and Angel Gomez
1910 Self-Attentional Acoustic Models Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker and Alex Waibel
1914 Evolving Learning for Analysing Mood-Related Infant Vocalisation Zixing Zhang, Jing Han, Kun Qian and Björn Schuller
1920 On Training and Evaluation of Grapheme-to-Phoneme Mappings with Limited Data Dravyansh Sharma
1921 Analysis of sparse representation based feature on speech mode classification Kumud Tripathi and K. Sreenivasa Rao
1928 Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions Nagapuri Srinivas, Gayadhar Pradhan and Syed Shahnawazuddin
1929 VoxCeleb2: Deep Speaker Recognition Joon Son Chung, Arsha Nagrani and Andrew Zisserman
1933 Demonstrating and modelling systematic time-varying annotator disagreement in continuous emotion annotation Mia Atcheson, Vidhyasaharan Sethu and Julien Epps
1937 Multicomponent 2-D AM-FM Modeling of Speech Spectrograms Jitendra Dhiman, Neeraj Sharma and Chandra Sekhar Seelamantula
1939 Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video using Semantic Segmentation with Fully Convolutional Networks Valliappan CA, Renuka Mannem and Prasanta Ghosh
1940 Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation Zhong-Qiu Wang and DeLiang Wang
1942 Fearless Steps: Apollo-11 Corpus Advancements for speech technologies from Earth to the Moon John H.L. Hansen, Abhijeet Sangwan, Aditya Joglekar, Ahmet E. Bulut, Chengzhu Yu and Lakshmish Kaushik
1943 Deep Lip Reading: a comparison of models and an online application Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
1944 Variation in the FACE vowel across West Yorkshire: Implications for forensic speaker comparisons Kate Earnshaw and Erica Gold
1947 Analysis of Variational Mode Functions for Robust Detection of Vowels Surbhi Sakshi, Avinash Kumar and Gayadhar Pradhan
1948 Respiratory and Respiratory Muscular Control in JL1’s and JL2’s Text Reading Utilizing 4-RSTs and a Soft Respiratory Mask with a Two-Way Bulb Toshiko Isei-Jaakkola, Keiko Ochi and Keikichi Hirose
1950 Phase-locked loop based phase estimation in single channel speech enhancement Priya Pallavi and Chevula Rama Rao
1955 Visual Speech Enhancement Aviv Gabbay, Asaph Shamir and Shmuel Peleg
1958 Identification and classification of fricatives in speech using zero time windowing method RaviShankar Prasad and Bayya Yegnanarayana
1959 Neural network architecture that combines temporal and summative features for infant cry classification in the Interspeech 2018 Computational Paralinguistics Challenge Mark Huckvale
1962 Language-Dependent Melody Embeddings Daniil Kocharov and Alla Menshikova
1966 Building a Unified Code-Switching ASR System for South African Languages Emre Yilmaz, Astik Biswas, Ewald Van der westhuizen, Febe De Wet and Thomas Niesler
1967 Classification of disorders in vocal folds using Electroglottographic Signal Tanumay Mandal, K Sreenivasa Rao and Sanjay Kumar Gupta
1970 On The Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ruibo Fu
1972 Comparison of unsupervised modulation filter learning methods for ASR Purvi Agrawal and Sriram Ganapathy
1973 Phonological Posterior Hashing for Query by Example Spoken Term Detection Afsaneh Asaei, Dhananjay Ram and Herve Bourlard
1974 Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition Pengcheng Guo, haihua xu, lei xie and Eng Siong Chng
1979 Efficient keyword spotting using time delay neural networks Samuel Myer and Vikrant Singh Tomar
1983 Analysis of L2 learners’ progress of distinguishing Mandarin Tone 2 and Tone 3 Yue Sun, Win Thuzar Kyaw, Jinsong Zhang and Yoshinori Sagisaka
1987 An Optimization Framework for Reconstruction of Speech From a Phase-Encoded Spectrogram Abhilash Sainathan, Sunil Rudresh and Chandra Sekhar Seelamantula
1988 A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave and Elmar Noeth
1990 Fast Language Adaptation Using Phonological Information Sibo Tong, Philip N. Garner and Herve Bourlard
1991 Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
2001 Multiple Phase Information Combination for Replay Attacks Detection Dongbo LI, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan and Xiangang Li
2003 Far-Field Speech Recognition Using Multivariate Autoregressive Models Sriram Ganapathy and Madhumita Harish
2005 A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions Yasuhito Ohsugi, Daisuke Saito and Nobuaki Minematsu
2011 Multimodal Polynomial Fusion for Detecting Driver Distraction Yulun Du, Alan W Black, Louis-Philippe Morency and Maxine Eskenazi
2012 Supervised i-vector Modeling - Theory and Applications Shreyas Ramoji and Sriram Ganapathy
2014 Detection of Glottal Excitation Epochs in Speech Signal Using Hilbert Envelope Hirak Dasgupta, Prem C. Pandey and K S Nataraj
2015 End-to-end deep neural network age estimation Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesus Villalba, Dan Povey, Sanjeev Khudanpur and Najim Dehak
2017 Term Extraction via Neural Sequence Labeling A Comparative Evaluation of Strategies Using Recurrent Neural Networks Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
2019 Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov and David Marshall
2022 Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation Markus Kitza, Ralf Schlüter and Hermann Ney
2025 Interactions Between Vowels and Nasal Codas in Mandarin Speakers’ Perception of Nasal Finals chong cao, Wei Wei, Wei Wang, Yanlu Xie and Jinsong Zhang
2027 Unsupervised discovery of non-native pronunciation patterns in L2 English speech for mispronunciation detection and diagnosis Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu and Helen Meng
2028 AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies Sourish Chaudhuri, Joseph Roth, Dan Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
2029 Classification of Huntington's Disease Using Acoustic and Lexical Features Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts and Emily Mower Provost
2030 A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition Hao Tang, Wei-Ning Hsu, Francois Grondin and James Glass
2031 Implementation of Digital Hearing Aid as a Smartphone Application Saketh Sharma, Nitya Tiwari and Prem C. Pandey
2032 VoiceGuard: Secure and Private Speech Processing Ferdinand Brasser, Tommaso Frassetto, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider and Christian Weinert
2040 How did you like 2017? Detection of language markers of depression and narcissism in personal narratives Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
2042 Effects of dimensional input on paralinguistic information perceived from synthesized dialogue speech with neural network Masaki Yokoyama, Tomohiro Nagata and Hiroki Mori
2043 State of mind: Classification through self-reported affect and word use in speech. Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
2045 Music Genre Recognition using Deep Neural Networks and Transfer Learning Deepanway Ghosal and Maheshkumar H. Kolekar
2053 Who are you listening to? Towards a dynamic measure of auditory attention to speech-on-speech. Moïra-Phoebé Huet, Christophe Micheyl, Etienne Gaudrain and Etienne Parizet
2057 Subword and Crossword Units for CTC Acoustic Models Thomas Zenkel, Ramon Sanabria, Florian Metze and Alex Waibel
2061 Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme prediction Antoine Bruguier, Anton Bakhtin and Dravyansh Sharma
2062 Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi and Yushi Aono
2065 Towards Automated Single Channel Source Separation using Neural Networks Arpita Gang, Pravesh Biyani and Akshay Soni
2066 The Effect of Real-Time Constraints on Automatic Speech Animation Danny Websdale, Sarah Taylor and Ben Milner
2067 Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models Koji Inoue, Divesh Lala, Katsuya Takanashi and Tatsuya Kawahara
2071 Analysis of Language Dependent Front-End for Speaker Recognition Srikanth Madikeri, Subhadeep Dey and Petr Motlicek
2072 Neural response development during distributional learning Natalie Boll-Avetisyan, Jessie S. Nixon, Tomas O. Lentz, Liquan Liu, Sandrien van Ommen, Çağri Çöltekin and Jacolien van Rij
2075 Learning interpretable control dimensions for speech synthesis by using external data Zack Hodari, Oliver Watts, Srikanth Ronanki and Simon King
2078 Talker diarization in the wild: The case of child-centered daylong audio-recordings Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas and Sriram Ganapathy
2079 CRIM's System for the MGB-3 English Multi-Genre Broadcast Media Transcription vishwa gupta and Gilles Boulianne
2080 Investigating Objective Intelligibility in Real-Time EMG-to-Speech Conversion Lorenz Diener and Tanja Schultz
2081 Implementing DIANA to model isolated auditory word recognition in English Filip Nenadić, Louis ten Bosch and Benjamin V. Tucker
2082 Memory Time Span in LSTMs for Multi-Speaker Source Separation Jeroen Zegers and Hugo Van hamme
2083 Wavelet Transform based Mel-scaled Features for Acoustic Scene Classification Shefali Waldekar and Goutam Saha
2084 Analyzing vocal tract movements during speech accommodation Sankar Mukherjee, Thierry Legou, Leonardo Lancia, Pauline Hilt, Alice Tomassini, Luciano Fadiga, Alessandro D'Ausilio, Leonardo Badino and Noël Nguyen
2087 FACTS: A hierarchical task-based control model of speech incorporating sensory feedback Benjamin Parrell, Vikram Ramanarayanan, Srikantan Nagarajan and John Houde
2089 Loud and Shouted Speech Perception at Variable Distances in a Forest Julien Meyer, Fanny Meunier, Laure Dentel, Noelia Do Carmo Blanco and Frédéric Sèbe
2090 Analysis of the Effect of Speech-Laugh on Speaker Recognition System Sri Harsha Dumpala, Ashish Panda and Sunil Kumar Kopparapu
2096 Temporal Noise Shaping with Companding Arijit Biswas, Per Hedelin, Lars Villemoes and Vinay Melkote
2104 Experience-dependent influence of music and language on lexical pitch learning is not additive Akshay Maggu, Patrick Wong, Hanjun Liu and Francis Wong
2112 Building large-vocabulary speaker-independent lipreading systems Kwanchiva thangthai and Richard Harvey
2114 Effects of homophone density on spoken word recognition in Mandarin Chinese Bhamini Sharma
2115 Analyzing Thai tone distribution through functional data analysis Hong Zhang
2117 TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages Noor Fathima, Tanvina Patel, Mahima C and Anuroop Iyengar
2118 Improved Acoustic Modelling For Automatic Literacy Assessment Of Children Mauro Nicolao, Michiel Sanders and Thomas Hain
2119 Speech intelligibility enhancement based on a non-causal Wavenet-like model Muhammed Shifas PV, Vassilis Tsiaras and Yannis Stylianou
2122 Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali, and Oriya Debadatta Dash, Myungjong Kim, Kristin Teplansky and Jun Wang
2124 Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs Matthew Roddy, Gabriel Skantze and Naomi Harte
2125 Robust Mizo Continuous Speech Recognition Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
2128 Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors Anna Silnova, Niko Brummer, Daniel Garcia-Romero, David Snyder and Lukas Burget
2129 Discourse Marker Detection for Hesitation Events on Mandarin Conversation Yu-Wun Wang, Hen-Hsen Huang, Kuan-Yu Chen and Hsin-Hsi Chen
2130 Learning two tone languages enhances the brainstem encoding of lexical tones Akshay Maggu, Wenqing Zong, Vina Law and Patrick Wong
2133 Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri Tanvina Patel, Krishna DN, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
2138 Factorized Deep Neural Network Adaptation for Automatic Scoring of L2 Speech in English Speaking Tests Dean Luo, Chunxiao Zhang, Linzhong Xia and Lixin Wang
2148 Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery Thomas Glarner, Patrick Hanebrink, Janek Ebbers and Reinhold Haeb-Umbach
2149 Investigating Utterance level Representations for detecting Intent from Acoustics SaiKrishna Rallabandi, Carla Viegas, Bhavya Karki, Eric Nyberg and Alan W Black
2155 A Lightly Supervised Approach to Detect Stuttering in Children's Speech Sadeen Alharbi, Madina Hasan, Anthony J H Simons, Shelagh Brumfitt and Phil Green
2156 Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan and Xiangang Li
2158 Semi-tied Units for Efficient Gating in LSTM and Highway Networks Chao Zhang and Phil Woodland
2162 Leveraging translations for speech transcription in low-resource settings Antonios Anastasopoulos and David Chiang
2165 Creak in the respiratory cycle Kätlin Aare, Pärtel Lippus, Marcin Wlodarczak and Mattias Heldner
2169 Multi-Lingual Depression-Level Assessment from Conversational Speech Using Acoustic and Text Features Yasin Özkanca, Cenk Demiroglu, Aslı Besirli and Selime Celik
2172 The EURECOM submission to the first DIHARD Challenge Jose Patino, Héctor Delgado and Nicholas Evans
2173 Subband weighting for binaural speech source localization Karthik Girija Ramesan, Parth Suresh and Prasanta Ghosh
2174 Neural MultiVoice Models for Expressing Novel Personalities in Dialog Shereen Oraby, Lena Reed, Sharath T.S., Shubhangi Tandon and Marilyn Walker
2185 Role Play Dialogue Aware Language Models based on Conditional Hierarchical Recurrent Encoder-Decoder Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
2186 Patient Privacy in Paralinguistic Tasks Francisco Teixeira, Alberto Abad and Isabel Trancoso
2187 Recognition of the Infant's Emotional Cry in Domestic Environments using the Capsule Network Architecture Mehmet Ali Tugtekin Turan and Engin Erzin
2191 An investigation of mixup training strategies for acoustic models in ASR Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin and Alexander Zatvornitskiy
2194 Unspeech: Unsupervised Speech Context Embeddings Benjamin Milde and Chris Biemann
2195 Conditional Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling Raffaele Tavarone and Leonardo Badino
2196 Integrating neural network based beamforming and weighted prediction error dereverberation Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
2204 Efficient Voice Trigger Detection for Low Resource Hardware Siddharth Sigtia, Rob Haynes, Hywel Richards, Erik Marchi and John Bridle
2209 Speaker adaptive training and mixup regularization for neural network acoustic models in automatic speech recognition Natalia Tomashenko, Yuri Khokhlov and Yannick Estève
2211 Task specific sentence embeddings to detect ASR errors Sahar Ghannay, Yannick Estève and Nathalie Camelin
2212 A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information Lucas Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan Carlos Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
2213 Artificial Bandwidth Extension with Memory Inclusion using Semi-supervised Stacked Auto-encoders Pramod Bachhav, Massimiliano Todisco and Nicholas Evans
2215 Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions Bekir Berker Türker, Engin Erzin, Yücel Yemez and Metin Sezgin
2221 Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
2222 Modeling Self-Reported and Observed Affect from Speech Jian Cheng, Jared Bernstein, Elizabeth Rosenfeld, Peter Foltz, Alex Cohen, Terje Holmlund and Brita Elvevaag
2224 Wuxi Speakers’ in Production and Perception of Coda Nasals in Mandarin Lei Wang, Jie Cui and Ying Chen
2225 Acoustic and Perceptual Characteristics of Mandarin Speech in Homosexual and Heterosexual Male Speakers Puyang Geng, Wentao Gu and Hiroya Fujisaki
2226 Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition Vishwas Shetty, Rini A Sharon, Basil Abraham, Tejaswi Seeram, Anusha Prakash, Nithya Ravi and Umesh S
2228 Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng and Lianhong Cai
2238 Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Automatic Speaker Verification and Speech Enhancement Peter Guzewich, Stephen Zahorian, Xiao Chen and Hao Zhang
2246 Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li and Michiel Bacchiani
2250 Dysarthric Speech Recognition Using Convolutional LSTM Neural Network Myungjong Kim, Beiming Cao, Kwanghoon An and Jun Wang
2256 Is ATIS too shallow to go deeper for benchmarking Spoken Language Understanding models? FREDERIC BECHET and Christian Raymond
2259 Online Incremental Learning for Speaker-Adaptive Language Models Chih Chi Hu, Bing Liu, John Shen and Ian Lane
2261 Self-Assessed Affect Recognition using Fusion of Attentional BLSTM and Static Acoustic Features Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
2263 Lexical And Acoustic Deep Learning Model For Personality Recognition Guozhen An and Rivka Levitan
2269 Deep Personality Recognition For Deception Detection Guozhen An, Sarah Ita Levitan, Julia Hirschberg and Rivka Levitan
2275 Articulatory Feature Classification using Convolutional Neural Networks Danny Merkx and Odette Scharenborg
2279 End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention Francis Tom, Mohit Jain and Prasenjit Dey
2284 Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao and Fil Alleva
2286 Audio-visual voice conversion using deep canonical correlation analysis for deep bottleneck features Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu and Tomoki Toda
2288 Speaker-specific structure in German voiceless stop voice onset times Marc Antony Hullebus, Stephen Tobin and Adamantios Gafos
2289 Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
2290 Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network Yi Luo and Nima Mesgarani
2293 Tone Recognition Using Lifters and CTC Loren Lugosch and Vikrant Singh Tomar
2295 Multimodal i-vectors to Detect and Evaluate Parkinson's Disease Nicanor Garcia, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave and Elmar Noeth
2297 On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children Gary Yeung and Abeer Alwan
2298 LSTM based Cross-corpus and Cross-task Acoustic Emotion Recognition Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
2299 Classification of nonverbal human produced audio events: a pilot study Rachel E. Bouserhal, Philippe Chabot, Milton Sarria-Paja, Patrick Cardinal and Jeremie Voix
2300 End-to-end text-dependent Speaker Verification using novel distance measures Subhadeep Dey, Srikanth Madikeri and Petr Motlicek
2304 Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge Valter Akira Miasato Filho, Diego Augusto Silva and Luis Gustavo Depra Cuozzo
2305 Triplet Network with Attention for Speaker Diarization Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha and Andreas Spanias
2306 Dereverberation and Beamforming in Robust Far-Field Speaker Recognition Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
2310 Improving Response Time of Active Speaker Detection using Visual Prosody Information Prior to Articulation Fasih Haider, Saturnino Luz, Carl Vogel and Nick Campbell
2318 Domain-Adversarial Training for Session Independent EMG-based Speech Recognition Michael Wand, Tanja Schultz and Juergen Schmidhuber
2321 Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai Doss, Hema Murthy and Shrikanth Narayanan
2323 R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection Chieh-Chi Kao, Weiran Wang, Ming Sun and Chao Wang
2324 The ACLEW DiViMe: An easy-to-use diarization tool Adrien Le Franc, Eric Riebling, Julien Karadayi, Camila Scaff, Yun Wang, Florian Metze and Alejandrina Cristia
2326 Music Source Activity Detection and Separation using Deep Attractor Network Rajath Kumar, Yi Luo and Nima Mesgarani
2330 Speech database and protocol validation using waveform entropy Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans and Jean-Francois Bonastre
2331 Influences of fundamental oscillation on speaker identification in vocalic utterances by humans and computers Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab and Dieter Maurer
2334 Multilingual bottleneck features for subword modeling in zero-resource languages Enno Hermann and Sharon Goldwater
2335 Combining Natural Gradient with Hessian Free Methods for Sequence Training Mustafa Haider and Philip Woodland
2338 A simple model for detection of rare sound events Weiran Wang, Chieh-Chi Kao and Chao Wang
2341 Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech Yu-An Chung and James Glass
2350 Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition Anderson R. Avila, Md Jahangir Alam, Douglas O'Shaughnessy and Tiago Falk
2352 Voice source contribution to prominence perception: Rd implementation Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide and Christer Gobl
2355 The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin McInnis and Emily Mower Provost
2358 Robust Spoken Language Understanding via Paraphrasing Avik Ray, Yilin Shen and Hongxia Jin
2359 Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR Leda Sari, Mark Hasegawa-Johnson, Kumaran S, Georg Stemmer and Krishnakumar N Nair
2360 Implementing Fusion Techniques for the Classification of Paralinguistic Information Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D S and Mathew Magimai Doss
2361 Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition Karel Vesely, Carlos Segura, Igor Szöke, Jan Černocký and Jordi Luque
2362 Improvements to an Automated Content Scoring System for Spoken CALL Responses: The ETS submission to the Second Spoken CALL Shared Task Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
2364 Learning word embeddings: unsupervised methods for fixed-size representations of variable-length speech segments Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad and Emmanuel Dupoux
2366 Acoustic-prosodic entrainment in structural metadata events Vera Cabarrão, Fernando Batista, Helena Moniz, Isabel Trancoso and Ana Isabel Mata
2371 Estimation of the asymmetry parameter of the glottal flow waveform using the Electroglottographic signal Joao Cabral
2372 The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick and Kristin Heaton
2373 The Diphthongs of Formal Nigerian English: A Preliminary Acoustic Analysis Natalia Dyrenko and Robert Fuchs
2377 Bubble Cooperative Networks for identifying important speech cues Viet Anh Trinh, Michael Mandel and Brian McFee
2381 Studying vowel variation in French-Algerian Arabic code-switched speech Jane Wottawa, Amazouz Djegdjiga, Martine Adda-Decker and Lori Lamel
2383 Large Vocabulary Concatenative Resynthesis Soumi Maiti, Joey Ching and Michael Mandel
2384 Sampling strategies in Siamese Networks for unsupervised speech representation learning Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz and Emmanuel Dupoux
2389 Detection of Amyotrophic Lateral Sclerosis (ALS) via Acoustic Analysis Raquel Norel, Mary Pietrowicz, Carla Agurto, Shay Rishoni and Guillermo Cecchi
2391 Whistle-blowing ASRs: evaluating the need for more inclusive speech recognition systems Meredith Moore, Hemanth Venkateswara and Sethuraman Panchanathan
2394 Investigation on bandwidth extension for speaker recognition Phani Sankar Nidadavolu, Cheng-I Lai, Jesus Villalba and Najim Dehak
2397 Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks Zixiaofan Yang and Julia Hirschberg
2398 The Use of Machine Learning and Phonetic Endophenotypes to Discover Genetic Variants Associated with Speech Sound Disorder Jason Lilley, Erin Crowgey and H Timothy Bunnell
2400 Experiments with training corpora for statistical text-to-speech systems. Monika Podsiadło and Victor Ungureanu
2403 An Efficient Approach to Encoding Context for Spoken Language Understanding Raghav Gupta, Abhinav Rastogi and Dilek Hakkani-Tur
2409 Cycle-Consistent Speech Enhancement Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
2412 LOCUST - Longitudinal Corpus and Toolset for Speaker Verification Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
2413 An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
2414 End-to-End Speech Recognition From the Raw Waveform Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert and Emmanuel Dupoux
2416 Contextual speech recognition in end-to-end neural network systems using beam search Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach and Tara Sainath
2417 Spanish Statistical Parametric Speech Synthesis using a Neural Vocoder Antonio Bonafonte, Santiago Pascual and Georgina Dorca
2418 Vocal biomarkers for cognitive performance estimation in a working memory task Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush Mehta, Ian Curry, Kristin Heaton, Jeffrey Palmer and Thomas Quatieri
2420 Wide Learing for Auditory Comprehension Elnaz Shafaei-Bajestan and R. Harald Baayen
2422 Analysis of phone errors in children's ASR through bottleneck feature visualisations Eva Fringi and Martin Russell
2423 A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement Yangyang Xia and Richard Stern
2427 A probability weighted beamformer for noise robust ASR suliang bu, yunxin zhao, MeiYuh Hwang and Sining Sun
2429 Infant emotional outbursts detection in infant-parent spoken interactions Yijia Xu, Mark Hasegawa-Johnson and Nancy McElwain
2430 Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact Wei Xia and John H.L. Hansen
2432 Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems Yu Wang, Chao Zhang, Mark Gales and Phil Woodland
2434 Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach Li LIU, Thomas HUEBER, Gang FENG and Denis BEAUTEMPS
2439 Concatenative Resynthesis with Improved Training Signals for Speech Enhancement Ali Raza Syed, Viet Anh Trinh and Michael Mandel
2440 Student-Teacher Learning for BLSTM Mask-based Speech Enhancement Aswin Shanmugam Subramanian, Szu-Jui Chen and Shinji Watanabe
2441 Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation Hakan Erdogan and Takuya Yoshioka
2443 Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues Sarah Ita Levitan, Angel Maredia and Julia Hirschberg
2446 Coherence models for dialogue Alessandra Cervone, Evgeny Stepanov and Giuseppe Riccardi
2453 Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant Leonid Velikovich, Ian Williams, Justin Scheiner, Petar Aleksic, Pedro Moreno and Michael Riley
2454 Cross-language Phoneme Mapping for Low-resource Languages: An Exploration of benefits and Trade-offs Nick Chibuye, Todd Rosenstock and Brian DeRenzi
2456 Multi-Modal Data Augmentation for End-to-end ASR Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji Watanabe
2457 BUT OpenSAT 2017 speech recognition system Martin Karafiat, Murali Karthick Baskar, Igor Szoke, Vladimir Malenovsky, Frantisek Grezl, Lukas Burget and Jan Černocký
2458 Effectiveness of Single-Channel BLSTM Enhancement for Language Identification Peter Frederiksen, Jesus Villalba, Shinji Watanabe, Zheng-Hua Tan and Najim Dehak
2461 Adversarial Feature-Mapping for Speech Enhancement Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
2462 Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features Linxue Bai, Philip Weber, Peter Jancovic and Martin Russell
2464 Predicting Categorical Emotions by Jointly Learning Primary and Secondary Emotions Through Multitask Learning Reza Lotfian and Carlos Busso
2466 Deep neural networks for emotion recognition combining audio and transcripts Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesus Villalba, Yishay Carmiel and Najim Dehak
2467 Expressive speech synthesis using sentiment embeddings Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi and Antonio Bonafonte
2473 ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages Jayadev Billa
2474 An Investigation of Non-linear i-vectors for speaker verification Nanxin Chen, Jesus Villalba and Najim Dehak
2475 Automatic detection of orofacial impairment in stroke Andrea Bandini, Jordan Green, Brian Richburg and Yana Yunusova
2476 Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition Zoltán Tüske, Ralf Schlüter and Hermann Ney
2478 Preference Learning with Qualitative Agreement for Sentence Level Emotional Annotations Srinivas Parthasarathy and Carlos Busso
2484 Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information Beiming Cao, Myungjong Kim, Jun R. Wang, Jan van Santen, Ted Mau and Jun Wang
2486 Semi-supervised and active-learning scenarios: Efficient acoustic model refinement for a low resource Indian language Maharajan Chellapriyadharshini, Anoop Toffy, Srinivasa Raghavan K. M. and V Ramasubramanian
2490 Audiovisual Speech Activity Detection with Advanced Long Short-Term Memory Fei Tao and Carlos Busso
2495 Estimation of Fundamental Frequency From Singing Voice using Harmonics of Impulse-like Excitation Source Sudarsana Reddy Kadiri and Bayya Yegnanarayana
2496 Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks KWANGHOON AN, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
2498 Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients Sudarsana Reddy Kadiri and Bayya Yegnanarayana
2502 Analysis and Detection of Phonation Mode in Singing Voice using Excitation Source Features Sudarsana Reddy Kadiri and Bayya Yegnanarayana
2505 Perceptual sensitivity to spectral change in Australian English close front vowels: an electroencephalographic investigation Daniel Williams, Paola Escudero and Adamantios Gafos
2508 Role of Regularization in the Prediction of Valence From Speech Kusha Sridhar, Srinivas Parthasarathy and Carlos Busso
2512 An ultrasound study of gemination in coronal stops in Eastern Oromo Maida Percival
2513 Truncation and compression in Southern German and Australian English Jenny Yu and Katharina Zahner
2516 DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker and Amir Hussain
2517 Improved training for online end-to-end speech recognition systems Suyoun Kim, Michael Seltzer, Jinyu Li and Rui Zhao
2522 Detecting Depression with Audio/Text Sequence Modeling of Interviews Tuka Al Hanai, Mohammad Ghassemi and James Glass
2523 Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations Zixing Zhang, Alejandrina Cristia, Anne Warlaumont and Björn Schuller
2525 Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion Seyed Hamidreza Mohammadi and Taehwan Kim
2527 Conversational Analysis using Utterance-level Attention-based Bidirectional Recurrent Neural Networks Chandrakant Bothe, Sven Magg, Cornelius Weber and Stefan Wermter
2529 Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi and V Ramasubramanian
2530 An Empirical Analysis of the Correlation of Syntax and Prosody Oskar Dörfler, Arne Köhn and Timo Baumann
2532 On the relationship between glottal pulse shape and its spectrum: correlations of open quotient, pulse skew and peak flow with source harmonic amplitudes Christer Gobl, Andy Murphy, Irena Yanushevskaya and Ailbhe Ní Chasaide
2533 Analysing the Focus of a Hierarchical Attention Network: The Importance of Enjambments When Classifying Post-modern Poetry Timo Baumann, Hussein Hussein and Burkhard Meyer-Sickendiek
2537 Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI Pramit Saha, Praneeth Srungarapu and Sidney Fels
2544 Automatic Evaluation of Soft Articulatory Contact for Stuttering Treatment Keiko Ochi, Koichi Mori and Naomi Sakai
2551 Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins and Shrikanth Narayanan
2559 Detecting Media Sound Presence in Acoustic Scenes Constantinos Papayiannis, Justice Amoh, Viktor Rozgic, Shiva Sundaram and Chao Wang
2561 Improving Mandarin tone recognition using convolutional bidirectional Long Short-Term Memory with Attention Longfei Yang, Yanlu Xie and Jinsong Zhang
2566 Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models Chanwoo Kim, Ehsan Variani, Arun Narayanan and Michiel Bacchiani
2572 Automatic glottis localization and segmentation in stroboscopic videos using deep neural network Achuth Rao MV, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini and Prasanta Ghosh
2577 Multi-frame Quantization of LSF Parameters using a Deep Autoencoder and Pyramid Vector Quantizer Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
2578 Multi-frame Coding of LSF Parameters using Block-Constrained Trellis Coded Vector Quantization Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yueming Ding
2581 A End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals Dengke Tang, Junlin Zeng and Ming Li
2587 Joint Learning of Facial Expression and Head Pose from Speech David Greenwood, Iain Matthews and Stephen Laycock
2590 A New Frequency Coverage Metric and A New Subband Encoding Model, With An Application In Pitch Estimation Shoufeng Lin
2591 Links between use of transitional probabilities for word segmentation, parental speech input and early speech production Mélanie Hoareau, Henny Yeung and Thierry Nazzi
2592 Sensorimotor response to tongue displacement imagery by talkers with Parkinson’s disease William Katz, Patrick Reidy and Divya Prabhakaran
2597 Within- versus between-category perception of acoustic correlates of lexical stress: A cross-linguistic study Natalie Boll-Avetisyan, Saioa Larraza, Aislyn Rose, Sylvie Margules, Ranka Bijeljac-Babic, Barbara Höhle and Thierry Nazzi
2598 Acoustic analysis of whispery voice disguise in Mandarin Chinese Cuiling Zhang, Bin Li and Si Chen
3002 DialogOS: Simple and extensible dialogue modeling Alexander Koller, Timo Baumann and Arne Köhn
3003 A Framework for Speech Recognition Benchmarking Franck Dernoncourt, Trung Bui and Walter Chang
3004 Flexible tongue housed in a static model of the vocal tract with jaws, lips and teeth Takayuki Arai
3005 Voice Analysis Using Acoustic and Throat Microphones for Speech Therapy Lani Mathew and Gopakumar K.
3006 A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa Manny Rayner, Nikos Tsourakis and Jan Stanek
3008 Intonation tutor by SPIRE (In-SPIRE): An online tool for an automatic feedback to the second language learners in learning intonation Anand P A, Chiranjeevi Yarra, Kausthubha N K and Prasanta Ghosh
3009 SPIRE-SST: An automatic web-based self-learning tool for syllable stress tutoring (SST) to the second language learners Chiranjeevi Yarra, Anand P A, Kausthubha N K and Prasanta Ghosh
3011 The IBM Virtual Voice Creator Alexander Sorin, Slava Shechtman, Zvi Kons, Ron Hoory, Shay Ben-David, Joe Pavitt, Shai Rozenberg, Carmel Rabinovitz and Tal Drory
3012 Mobile Application for Learning Languages for the Unlettered Gayathri G, Mohana N, Radhika Pal and Hema Murthy
3014 Mandarin-English Code-switching Speech Recognition Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
3015 Captaina: Integrated pronunciation practice and data collection portal Aku Rouhe, Reima Karhila, Aija Elg, Minnaleena Toivola, Peter Smit, Anna-Riikka Smolander and Mikko Kurimo
3016 auMina - Enterprise Speech Analytics Umesh Sachdev, Rajagopal Jayaraman and Zainab Millwala
3017 HoloCompanion: An MR Friend for EveryOne Annam Naresh, Rushabh Gandhi, Mallikarjuna Rao Bellamkonda and Mithun Das Gupta
3018 akeira - Virtual Assistant Umesh Sachdev, Rajagopal Jayaraman and Zainab Millwala
3019 Brain-Computer Interface using Electroencephalogram signatures of Eye Blinks Srihari Maruthachalam, Sidharth Aggarwal, Mari Ganesh Kumar, Mriganka Sur and Hema Murthy
3022 Early vocabulary development through picture-based software solutions Kasthuri G, Prabha Ramanathan, Hema Murthy, Namita Jacob and Anil Prabhakar
3023 Remote Analysis of Voice and Speech Characteristics Ladan Baghai-Ravary and Steve Beet
3026 Automatic detection of expressiveness in oral reading Kamini Sabu, Kanhaiya Kumar and Preeti Rao
3027 PannoMulloKathan: Voice enabled Mobile App for Agricultural Commodity Price Dissemination in Bengali Language Madhab Pal, Rajib Roy, Soma Khan, Milton S. Bepari and Joyanta Basu
3028 Visualizing Punctuation Restoration in Speech Transcripts with Prosograph Alp Öktem, Mireia Farrús and Antonio Bonafonte
3029 CACTAS - Collaborative Audio Categorization and Transcription for ASR Systems Mithul Mathivanan, Abhishek Pandey and Jithendra Vepa
3030 Hierarchical Accent Determination and Application in a Large Scale ASR System Ramya Viswanathan, periyasamy Paramasivam and Jithendra Vepa
3032 Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL MOOC Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
3033 Machine Learning powered Data Platform for High-Quality Speech and NLP workflows João Freitas, Jorge Ribeiro, Daan Baldwijns, Sara Oliveira and Daniela Braga
3034 Fully automatic speaker separation system, with automatic enrolling of recurrent speakers Raphael Cohen, Orgad Keller, Jason Levy, Russell Levy and Micha Breakstone
3035 Online speech translation system for Tamil Madhavaraj Ayyavu, Shiva Kumar H R and Ramakrishnan A G
3036 Extracting speaker’s gender, accent, age and emotional state from speech Nagendra Goel, Mousmita Sarma, Tejendra Kushwah, Dharmesh Agarwal, Zikra Iqbal and Surbhi Chauhan
3042 Determining Speaker Location from Speech in a Practical Environment BHVS Narayanamurthy, JV Satyanarayana and B Yegnanarayana
3043 An Automatic Speech Transcription System for Manipuri Language Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
3045 Game-based spoken dialog language learning applications for young students Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
3046 Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S R Mahadeva Prasanna and Priyankoo Sarmah
3047 An automated assistant for medical scribes Gregory Finley, Erik Edwards, Amanda Robinson, Najmeh Sadoughi, James Fone, Mark Miller, David Suendermann-Oeft, Michael Brenndoerfer and Nico Axtmann
3048 AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
3049 Voice-powered solutions with Cloud AI Dan Aharon
3050 Speech synthesis in the wild Ganesh Sivaraman, Parav Nagarsheth and Elie Khoury