The DeepVF server comes back online! It has undergone an upgrade in line with the university's security requirements. We apologize for any inconvenience brought by this unexpected shut down.

DeepVF logo

A deep learning-based hybrid framework for recognizing virulence factors using the stacking strategy

Virulence factors (VFs) in Gram-negative bacteria enable pathogens to infect their hosts. A wealth of individual, disease-focused studies have identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; Secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would in particular be useful in the context of genome wide predictions of VFs.

In this work, we comprehensively explore a wide range of various types of heterogeneous features based on an enlarged, up-to-date dataset assembled from several public databases and the recent literature. Specifically, seven popular machine learning algorithms consisting of four classical machine learning algorithms including random forest (RF), support vector machines (SVM), extreme gradient boosting (XGBoost) and multilayer perceptron (MLP), and three deep learning algorithms, including convolutional neural networks (CNN), long short-term memory networks (LSTM) and bi-directional long short-term memory networks (BiLSTM) are employed to train 62 single-method baseline models using these features. Moreover, we effectively combine these baseline models in a deep learning-based hybrid framework (termed DeepVF) using the stacking strategy in order to integrate their individual prediction strengths. The resulting model is shown to be able to accurately predict VFs for Gram-negative bacteria. Extensive experimental results demonstrate the effectiveness of DeepVF: it achieves a much better performance compared to single-method baseline models on the benchmark dataset, and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF is implemented and can be used as a useful tool for screening and identifying potential VFs in Gram-negative bacteria from sequences.

Go to Use it!

Learn more

Reminder:
If you find our work useful for your research work, please cite:
Development Team
Bioinformatics group
School of Computer Science and Information Security
Guilin University of Electronic Technology
Guilin 541004, China
Contact Us
Visitor Map