br In Step the current learning
In Step 4, the current learning model actively participates in the process of selecting the most informative images to be used in its own training, and in order to improve the query result, re-turning more similar images. Therefore, the proposed selection cri-teria are based on uncertainty and similarity in relation to the query image q. So, the temporary training set Z 1 receives the most
Algorithm 1: Proposed Approach - MARRow.
input : query image q output : final learning model M and final list LR ordered by relevance (from the most similar to the least similar to q)
auxiliaries: image dataset I, sets of feature extractors Fi and distance functions D j , learning set Z2, best feature extractor BestFD.FeatureExtractor, number of clusters k, set of centroids C, ordered list LS of the closest images to q, best distance function BestFD.Distance, number of desired samples ns, temporary training set Z 1, training set Z1, current learning model M, number of selected samples nu. 1 BestFD ← findBestPairFeatureDistance(I, Fi, D j ); 2 Z2 ← featureExtraction(BestFD.FeatureExtractor); 3 C ← clusteringCentroids(Z2, k); 4 LS ← similaritySearch(q, Z2, BestFD.Distance); 5 Z 1 ← C ∪ si ∈ LS, i = 1, 2, ..., ns; 6 Z1 ← annotation(Z 1); 7 M ← training(Z1);
Fig. 2. Pipeline of the proposed approach.
informative images, obtained by our proposed active learning strat-egy (described by Algorithm 2).
Algorithm 2: Proposed Active Learning Strategy.
input : number of selected samples nu, current learning model M, learning set Z2\Z1, query image q and best distance function BestFD.Distance output : list of selected samples LR auxiliaries: learning set Z2, ordered list of candidates LC
Then, the selected image set Z 1 is displayed to the expert. From the first learning iteration, this 432531-71-0 set of selected images are already previously labeled by the current instance of the model. So, the expert needs only to correct the labels of misclassified im-ages (validating as relevant or irrelevant). The images confirmed and corrected properly by the expert are added to the previous training set Z1 (Algorithm 1, Line 10). It is important to emphasize that our strategy does not show samples that were already labeled (Z2 Z1 = ∅ or in a simplified notation Z2\Z1). Afterwards, the training is performed again and a new instance of the model M is generated (Algorithm 1, Line 11).
Steps 3 and 4 (Algorithm 1, Lines 8 − 12) are repeated until the expert is satisfied with the results retrieved by the proposed learn-ing process. Once satisfied, we can obtain a final learning model M (which can be applied in an unlabeled dataset) and a final list LR ordered by relevance (from the most similar to the least one) in relation to the query image q.
2.1. Active learning strategy
We also proposed a new active learning strategy (described by Algorithm 2) that selects a small set of more informative images. Our idea is to explore and use the knowledge of the classifier, which was obtained from the most informative samples, improv-ing the image retrieval process.
Initially, Lines 1 − 3 from Algorithm 2 refer only to a control to verify if there are nu desired samples to be selected in the learn-ing set Z2\Z1, since it is an iterative process, in which a set of images is selected at each iteration. So, the learning set is classi-fied by the current learning model, generating the classified learn-ing set Z2 (Algorithm 2, Line 4). After the classification process, we obtain the list of the most informative (candidate) images LC (Algorithm 2, Line 5), according to the proposed selection criteria, which is deeply described by Algorithm 3. Then, we obtain the se-
Algorithm 3: Proposed Selection Strategy.
input : number of selected samples nu,learning set Z2,query image q and best distance function BestFD.Distance
5if s.labelid = adjs.labelid then
lected relevant set LR, composed of the nu most informative (most uncertain and similar) samples from LC (Algorithm 2, Lines 6 − 8). For the proposed selection strategy (Algorithm 3), initially, an ordered list of candidates LC is created to receive the candidate samples to be displayed for the expert (Algorithm 3, Line 1). Learn-ing lists Li, organized based on labels, are also created, where i denotes the ith class (i = 1, 0 , i.e. relevant and irrelevant classes, respectively). Then, each sample s from the learning set Z2, which was previously labeled by the current learning model, is analyzed (Algorithm 3, Lines 2 − 8), in order to evaluate which ones are the most informative candidates. All samples are separated according to the class labels provided by the model. After that, each sample s ∈ Z2 is stored in a list of labels Li, corresponding to their respec-tive class label i (i = s.labelid) (Algorithm 3, Line 3).