The method is based on auto cross covariance (ACC) transformation of protein sequences into uniform equal-length vectors. ACC is a protein sequence mining method developed by Wold et al. (Anal. Chim. Acta 1993;277:239-253). It has been applied to quantitative structure-activity relationships (QSAR) studies of peptides with different length.
The principal properties of the amino acids were represented by z descriptors, originally derived by Hellberg et al. (J. Med. Chem. 1987; 30:1126-1135). They describe amino acid hydrophobicity, molecular size and polarity.
The proteins are classified by k-nearest neighbor algorithm (kNN,k=3) based on training set containing 2210 known allergens from different species and 2210 non-allergens from the same species.