MATLAB-IEEE & Final Year PROJECTS
MATLAB-IEEE & Final Year PROJECTS
1. Dynamic Facial Expression Recognition with Atlas Construction and Sparse Representation
ABSTRACT
In this paper, a new dynamic facial expression recognition method is proposed. Dynamic facial expression recognition is formulated as a longitudinal groupwise registration problem. The main contributions of this method lie in the following aspects: 1) subject-specific facial feature movements of different expressions are described by a diffeomorphic growth model; 2) salient longitudinal facial expression atlas is built for each expression by a sparse groupwise image registration method, which can describe the overall facial feature changes among the whole population and can suppress the bias due to large intersubject facial variations; and 3) both the image appearance information in spatial domain and topological evolution information in temporal domain are used to guide recognition by a sparse representation method. The proposed framework has been extensively evaluated on five databases for different applications: the extended Cohn-Kanade, MMI, FERA, and AFEW databases for dynamic facial expression recognition, and UNBC-McMaster database for spontaneous pain expression monitoring. This framework is also compared with several state-of-the-art dynamic facial expression recognition methods. The experimental results demonstrate that the recognition rates of the new method are consistently higher than other methods under comparison.
2. Lossless Compression of JPEG Coded Photo Collections
ABSTRACT
The explosion of digital photos has posed a significant challenge to photo storage and transmission for both personal devices and cloud platforms. In this paper, we propose a novel lossless compression method to further reduce the size of a set of JPEG coded correlated images without any loss of information. The proposed method jointly removes inter/intra image redundancy in the feature, spatial, and frequency domains. For each collection, we first organize the images into a pseudo video by minimizing the global prediction cost in the feature domain. We then present a hybrid disparity compensation method to better exploit both the global and local correlations among the images in the spatial domain. Furthermore, the redundancy between each compensated signal and the corresponding target image is adaptively reduced in the frequency domain. Experimental results demonstrate the effectiveness of the proposed lossless compression method. Compared with the JPEG coded image collections, our method achieves average bit savings of more than 31%.
3. Pixel modeling using histograms based on fuzzy partitions for dynamic background subtraction
ABSTRACT
We propose a novel pixel-modeling approach for background subtraction using histograms based on strong uniform fuzzy partitions. In the proposed method, the temporal distribution of pixel values is represented by a histogram based on a triangular partition. The threshold for background segmentation is set adaptively according to the shape of the histogram. Histogram accumulation is controlled adaptively by a fuzzy controller under a supervised learning framework. Benefiting from the adaptive scheme, with no parameter tuning, the proposed algorithm functions well across a wide spectrum of challenging environments. The performance of the proposed method is evaluated against more than 20 state-of-the-art methods in complex outdoor environments, particularly in those consisting of highly dynamic backgrounds and camouflaged foregrounds. Experimental results confirm that the proposed method performs effectively in terms of both the true positive rate and the noise suppression ability. Further, it outperforms other state-of-the-art methods by a significant margin.
4. Layer-Based Approach for Image Pair Fusion
ABSTRACT
Recently, image pairs, such as noisy and blurred images or infrared and noisy images, have been considered as a solution to provide high-quality photographs under low lighting conditions. In this paper, a new method for decomposing the image pairs into two layers, i.e., the base layer and the detail layer, is proposed for image pair fusion. In the case of infrared and noisy images, simple naive fusion leads to unsatisfactory results due to the discrepancies in brightness and image structures between the image pair. To address this problem, a local contrast-preserving conversion method is first proposed to create a new base layer of the infrared image, which can have visual appearance similar to another base layer, such as the denoised noisy image. Then, a new way of designing three types of detail layers from the given noisy and infrared images is presented. To estimate the noise-free and unknown detail layer from the three designed detail layers, the optimization framework is modeled with residual-based sparsity and patch redundancy priors. To better suppress the noise, an iterative approach that updates the detail layer of the noisy image is adopted via a feedback loop. This proposed layer-based method can also be applied to fuse another noisy and blurred image pair. The experimental results show that the proposed method is effective for solving the image pair fusion problem.
5. Adaptive Pairing Reversible Watermarking
ABSTRACT
This letter revisits the pairwise reversible watermarking scheme of Ou et al., 2013. An adaptive pixel pairing that considers only pixels with similar prediction errors is introduced. This adaptive approach provides an increased number of pixel pairs where both pixels are embedded and decreases the number of shifted pixels. The adaptive pairwise reversible watermarking outperforms the state-of-the-art low embedding bit-rate schemes proposed so far.
6. Adaptive Part-Level Model Knowledge Transfer for Gender Classification
ABSTRACT
In this letter, we propose an adaptive part-level model knowledge transfer approach for gender classification of facial images based on Fisher vector (FV). Specifically, we first decompose the whole face image into several parts and compute the dense FVs on each face part. An adaptive transfer learning model is then proposed to reduce the discrepancies between the training data and the testing data for enhancing classification performance. Compared to the existing gender classification methods, the proposed approach is more adaptive to the testing data, which is quite beneficial to the performance improvement. Extensive experiments on several public domain face data sets clearly demonstrate the effectiveness of the proposed approach.
7. Patch-Based Video Denoising With Optical Flow Estimation
ABSTRACT
A novel image sequence denoising algorithm is presented. The proposed approach takes advantage of the selfsimilarity and redundancy of adjacent frames. The algorithm is inspired by fusion algorithms, and as the number of frames increases, it tends to a pure temporal average. The use of motion compensation by regularized optical flow methods permits robust patch comparison in a spatiotemporal volume. The use of principal component analysis ensures the correct preservation of fine texture and details. An extensive comparison with the state-of-the-art methods illustrates the superior performance of the proposed approach, with improved texture and detail reconstruction.
8. Fusion of Quantitative Image and Genomic Biomarkers to Improve Prognosis Assessment of Early Stage Lung Cancer Patients
ABSTRACT
This study aims to develop a new quantitative image feature analysis scheme and investigate its role along with 2 genomic biomarkers namely, protein expression of the excision repair cross-complementing 1 (ERCC1) genes and a regulatory subunit of ribonucleotide reductase (RRM1), in predicting cancer recurrence risk of Stage I non-small-cell lung cancer (NSCLC) patients after surgery. Methods: By using chest computed tomography images, we developed a computer-aided detection scheme to segment lung tumors and computed tumor-related image features. After feature selection, we trained a Naïve Bayesian network based classifier using 8 image features and a Multilayer Perceptron classifier using 2 genomic biomarkers to predict cancer recurrence risk, respectively. Two classifiers were trained and tested using a dataset with 79 Stage I NSCLC cases, a synthetic minority oversampling technique and a leave-one-case-out validation method. A fusion method was also applied to combine prediction scores of two classifiers. Results: AUC (areas under ROC curves) values are 0.78±0.06 and 0.68±0.07 when using the image feature and genomic biomarker based classifiers, respectively. AUC value significantly increased to 0.84±0.05 (p<0.05) when fusion of two classifier-generated prediction scores using an equal weighting factor. Conclusion: A quantitative image feature based classifier yielded significantly higher discriminatory power than a genomic biomarker based classifier in predicting cancer recurrence risk. Fusion of prediction scores generated by the two classifiers further improved prediction performance. Significance: We demonstrated a new approach that has potential to assist clinicians in more effectively managing Stage I NSCLC patients to reduce cancer recurrence risk.
9. Multivideo Object Cosegmentation for Irrelevant Frames Involved Videos
ABSTRACT
Even though there have been a large amount of previous work on video segmentation techniques, it is still a challenging task to extract the video objects accurately without interactions, especially for those videos which contain irrelevant frames (frames containing no common targets). In this essay, a novel multivideo object cosegmentation method is raised to cosegment common or similar objects of relevant frames in different videos, which includes three steps: 1) object proposal generation and clustering within each video; 2) weighted graph construction and common objects selection; and 3) irrelevant frames detection and pixel-level segmentation refinement. We apply our method on challenging datasets and exhaustive comparison experiments demonstrate the effectiveness of the proposed method.
10. Multi-Viewpoint Panorama Construction with Wide-Baseline Images
ABSTRACT
We present a novel image stitching approach, which can produce visually plausible panoramic images with input taken from different viewpoints. Unlike previous methods, our approach allows wide baselines between images and non-planar scene structures. Instead of 3D reconstruction, we design a mesh based framework to optimize alignment and regularity in 2D. By solving a global objective function consisting of alignment and a set of prior constraints, we construct panoramic images, which are locally as perspective as possible and yet nearly orthogonal in the global view. We improve composition and achieve good performance on misaligned area. Experimental results on challenging data demonstrate the effectiveness of the proposed method.
11.A Security-Enhanced Alignment-Free Fuzzy Vault-Based Fingerprint Cryptosystem Using Pair-Polar Minutiae Structures
ABSTRACT
Alignment-free fingerprint cryptosystems perform matching using relative information between minutiae, e.g., local minutiae structures, is promising, because it can avoid the recognition errors and information leakage caused by template alignment/registration. However, as most local minutiae structures only contain relative information of a few minutiae in a local region, they are less discriminative than the global minutiae pattern. Besides, the similarity measures for trivially/coarsely quantized features in the existing work cannot provide a robust way to deal with nonlinear distortions, a common form of intra-class variation. As a result, the recognition accuracy of current alignment-free fingerprint cryptosystems is unsatisfying. In this paper, we propose an alignment-free fuzzy vault-based fingerprint cryptosystem using highly discriminative pair-polar (P-P) minutiae structures. The fine quantization used in our system can largely retain information about a fingerprint template and enables the direct use of a traditional, well-established minutiae matcher. In terms of template/key protection, the proposed system fuses cancelable biometrics and biocryptography. Transforming the P-P minutiae structures before encoding destroys the correlations between them, and can provide privacy-enhancing features, such as revocability and protection against cross-matching by setting distinct transformation seeds for different applications. The comparison with other minutiae-based fingerprint cryptosystems shows that the proposed system performs favorably on selected publicly available databases and has strong security.
12.Microwave Unmixing With Video Segmentation for Inferring Broadleaf and Needleleaf Brightness Temperatures and Abundances From Mixed Forest Observations
ABSTRACT
Passive microwave sensors have better capability of penetrating forest layers to obtain more information from forest canopy and ground surface. For forest management, it is useful to study passive microwave signals from forests. Passive microwave sensors can detect signals from needleleaf, broadleaf, and mixed forests. The observed brightness temperature of a mixed forest can be approximated by a linear combination of the needleleaf and broadleaf brightness temperatures weighted by their respective abundances. For a mixed forest observed by an N-band microwave radiometer with horizontal and vertical polarizations, there are 2 N observed brightness temperatures. It is desirable to infer 4 N + 2 unknowns: 2 N broadleaf brightness temperatures, 2 N needleleaf brightness temperatures, 1 broadleaf abundance, and 1 needleleaf abundance. This is a challenging underdetermined problem. In this paper, we devise a novel method that combines microwave unmixing with video segmentation for inferring broadleaf and needleleaf brightness temperatures and abundances from mixed forests. We propose an improved Otsu method for video segmentation to infer broadleaf and needleleaf abundances. The brightness temperatures of needleleaf and broadleaf trees can then be solved by the nonnegative least squares solution. For our mixed forest unmixing problem, it turns out that the ordinary least squares solution yields the desired positive brightness temperatures. The experimental results demonstrate that the proposed method is able to unmix broadleaf and needleleaf brightness temperatures and abundances well. The absolute differences between the reconstructed and observed brightness temperatures of the mixed forest are well within 1 K.
13.2D Orthogonal Locality Preserving Projection for Image Denoising
ABSTRACT
Sparse representations using transform-domain techniques are widely used for better interpretation of the raw data. Orthogonal locality preserving projection (OLPP) is a linear technique that tries to preserve local structure of data in the transform domain as well. Vectorized nature of OLPP requires high-dimensional data to be converted to vector format, hence may lose spatial neighborhood information of raw data. On the other hand, processing 2D data directly, not only preserves spatial information, but also improves the computational efficiency considerably. The 2D OLPP is expected to learn the transformation from 2D data itself. This paper derives mathematical foundation for 2D OLPP. The proposed technique is used for image denoising task. Recent state-of-the-art approaches for image denoising work on two major hypotheses, i.e., non-local self-similarity and sparse linear approximations of the data. Locality preserving nature of the proposed approach automatically takes care of self-similarity present in the image while inferring sparse basis. A global basis is adequate for the entire image. The proposed approach outperforms several state-of-the-art image denoising approaches for gray-scale, color, and texture images.
14.Exploring the Usefulness of Light Field Cameras for Biometrics : An Empirical Study on Face and Iris Recognition
ABSTRACT
A light field sensor can provide useful information in terms of multiple depth (or focus) images, holding additional information that is quite useful for biometric applications. In this paper, we examine the applicability of a light field camera for biometric applications by considering two prominently used biometric characteristics: 1) face and 2) iris. To this extent, we employed a Lytro light field camera to construct two new and relatively large scale databases, for both face and iris biometrics. We then explore the additional information available from different depth images, which are rendered by light field camera, in two different manners: 1) by selecting the best focus image from the set of depth images and 2) combining all the depth images using super-resolution schemes to exploit the supplementary information available within the set elements. Extensive evaluations are carried out on our newly constructed database, demonstrating the significance of using additional information rendered by a light field camera to improve the overall performance of the biometric system
15.Spectral–Spatial Adaptive Sparse Representation for Hyperspectral Image Denoising
ABSTRACT
In this paper, a novel spectral-spatial adaptive sparse representation (SSASR) method is proposed for hyperspectral image (HSI) denoising. The proposed SSASR method aims at improving noise-free estimation for noisy HSI by making full use of highly correlated spectral information and highly similar spatial information via sparse representation, which consists of the following three steps. First, according to spectral correlation across bands, the HSI is partitioned into several nonoverlapping band subsets. Each band subset contains multiple continuous bands with highly similar spectral characteristics. Then, within each band subset, shape-adaptive local regions consisting of spatially similar pixels are searched in spatial domain. This way, spectral-spatial similar pixels can be grouped. Finally, the highly correlated and similar spectral-spatial information in each group is effectively used via the joint sparse coding, in order to generate better noise-free estimation. The proposed SSASR method is evaluated by different objective metrics in both real and simulated experiments. The numerical and visual comparison results demonstrate the effectiveness and superiority of the proposed method.
16.Robust Sclera Recognition System With Novel Sclera Segmentation and Validation Techniques
ABSTRACT
Sclera blood veins have been investigated recently as a biometric trait which can be used in a recognition system. The sclera is the white and opaque outer protective part of the eye. This part of the eye has visible blood veins which are randomly distributed. This feature makes these blood veins a promising factor for eye recognition. The sclera has an advantage in that it can be captured using a visible-wavelength camera. Therefore, applications which may involve the sclera are wide ranging. The contribution of this paper is the design of a robust sclera recognition system with high accuracy. The system comprises of new sclera segmentation and occluded eye detection methods. We also propose an efficient method for vessel enhancement, extraction, and binarization. In the feature extraction and matching process stages, we additionally develop an efficient method, that is, orientation, scale, illumination, and deformation invariant. The obtained results using UBIRIS.v1 and UTIRIS databases show an advantage in terms of segmentation accuracy and computational complexity compared with state-of-the-art methods due to Thomas, Oh, Zhou, and Das.
17.Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback
ABSTRACT
A sketch-based image retrieval often needs to optimize the tradeoff between efficiency and precision. Index structures are typically applied to large-scale databases to realize efficient retrievals. However, the performance can be affected by quantization errors. Moreover, the ambiguousness of user-provided examples may also degrade the performance, when compared with traditional image retrieval methods. Sketch-based image retrieval systems that preserve the index structure are challenging. In this paper, we propose an effective sketch-based image retrieval approach with re-ranking and relevance feedback schemes. Our approach makes full use of the semantics in query sketches and the top ranked images of the initial results. We also apply relevance feedback to find more relevant images for the input query sketch. The integration of the two schemes results in mutual benefits and improves the performance of the sketch-based image retrieval.
18.Detection of Moving Objects Using Fuzzy Color Difference Histogram Based Background Subtraction
ABSTRACT
Detection of moving objects in the presence of complex scenes such as dynamic background (e.g, swaying vegetation, ripples in water, spouting fountain), illumination variation, and camouflage is a very challenging task. In this context, we propose a robust background subtraction technique with three contributions. First, we present the use of color difference histogram (CDH) in the background subtraction algorithm. This is done by measuring the color difference between a pixel and its neighbors in a small local neighborhood. The use of CDH reduces the number of false errors due to the non-stationary background, illumination variation and camouflage. Secondly, the color difference is fuzzified with a Gaussian membership function. Finally, a novel fuzzy color difference histogram (FCDH) is proposed by using fuzzy c-means (FCM) clustering and exploiting the CDH. The use of FCM clustering algorithm in CDH reduces the large dimensionality of the histogram bins in the computation and also lessens the effect of intensity variation generated due to the fake motion or change in illumination of the background. The proposed algorithm is tested with various complex scenes of some benchmark publicly available video sequences. It exhibits better performance over the state-of-the-art background subtraction techniques available in the literature in terms of classification accuracy metrics like MCC and PCC.
19.A Decomposition Framework for Image Denoising Algorithms
ABSTRACT
In this paper, we consider an image decomposition model that provides a novel framework for image denoising. The model computes the components of the image to be processed in a moving frame that encodes its local geometry (directions of gradients and level lines). Then, the strategy we develop is to denoise the components of the image in the moving frame in order to preserve its local geometry, which would have been more affected if processing the image directly. Experiments on a whole image database tested with several denoising methods show that this framework can provide better results than denoising the image directly, both in terms of Peak signal-to-noise ratio and Structural similarity index metrics.
20.Distance-Based Encryption: How to Embed Fuzziness in Biometric-Based Encryption
ABSTRACT
We introduce a new encryption notion called distance-based encryption (DBE) to apply biometrics in identity-based encryption. In this notion, a ciphertext encrypted with a vector and a threshold value can be decrypted with a private key of another vector, if and only if the distance between these two vectors is less than or equal to the threshold value. The adopted distance measurement is called Mahalanobis distance, which is a generalization of Euclidean distance. This novel distance is a useful recognition approach in the pattern recognition and image processing community. The primary application of this new encryption notion is to incorporate biometric identities, such as face, as the public identity in an identity-based encryption. In such an application, usually the input biometric identity associated with a private key will not be exactly the same as the input biometric identity in the encryption phase, even though they are from the same user. The introduced DBE addresses this problem well as the decryption condition does not require identities to be identical but having small distance. The closest encryption notion to DBE is the fuzzy identity-based encryption, but it measures biometric identities using a different distance called an overlap distance (a variant of Hamming distance) that is not widely accepted by the pattern recognition community, due to its long binary representations. In this paper, we study this new encryption notion and its constructions. We show how to generically and efficiently construct such a DBE from an inner product encryption (IPE) with reasonable size of private keys and ciphertexts. We also propose a new IPE scheme with the shortest private key to build DBE, namely, the need for a short private key. Finally, we study the encryption efficiency of DBE by splitting our IPE encryption algorithm into offline and online algorithms.
21.Scalable Feature Matching by Dual Cascaded Scalar Quantization for Image Retrieval
ABSTRACT
In this paper, we investigate the problem of scalable visual feature matching in large-scale image search and propose a novel cascaded scalar quantization scheme in dual resolution. We formulate the visual feature matching as a range-based neighbor search problem and approach it by identifying hyper-cubes with a dual-resolution scalar quantization strategy. Specifically, for each dimension of the PCA-transformed feature, scalar quantization is performed at both coarse and fine resolutions. The scalar quantization results at the coarse resolution are cascaded over multiple dimensions to index an image database. The scalar quantization results over multiple dimensions at the fine resolution are concatenated into a binary super-vector and stored into the index list for efficient verification. The proposed cascaded scalar quantization (CSQ) method is free of the costly visual codebook training and thus is independent of any image descriptor training set. The index structure of the CSQ is flexible enough to accommodate new image features and scalable to index large-scale image database. We evaluate our approach on the public benchmark datasets for large-scale image retrieval. Experimental results demonstrate the competitive retrieval performance of the proposed method compared with several recent retrieval algorithms on feature quantization.
22.ACE–An Effective Anti-forensic Contrast Enhancement Technique
ABSTRACT
Detecting Contrast Enhancement (CE) in images and anti-forensic approaches against such detectors have gained much attention in multimedia forensics lately. Several contrast enhancement detectors analyze the first order statistics such as gray-level histogram of images to determine whether an image is CE or not. In order to counter these detectors various anti-forensic techniques have been proposed. This led to a technique that utilized second order statistics of images for CE detection. In this letter, we propose an effective anti-forensic approach that performs CE without significant distortion in both the first and second order statistics of the enhanced image. We formulate an optimization problem using a variant of the well known Total Variation (TV) norm image restoration formulation. Experiments show that the algorithm effectively overcomes the first and second order statistics based detectors without loss in quality of the enhanced image.
23.Visualization of Tumor Response to Neoadjuvant Therapy for Rectal Carcinoma by Nonlinear Optical Imaging
ABSTRACT
The continuing development of nonlinear optical imaging techniques has opened many new windows in biological exploration. In this study, a nonlinear optical microscopy-multiphoton microscopy (MPM) was expanded to detect tumor response in rectal carcinoma after neoadjuvant therapy; especially normal tissue, pre- and post-therapeutic cancerous tissues were investigated in order to present more detailed information and make comparison. It was found that the MPM has ability not only to directly visualize histopathologic changes in rectal carcinoma, including stromal fibrosis, colloid response, residual tumors, blood vessel hyperplasia, and inflammatory reaction, which had been proven to have important influence on estimation of the prognosis and the effect of neoadjuvant treatment, but also to provide quantitative optical biomarkers including the intensity ratio of SHG over TPEF and collagen orientation index. These results show that the MPM will become a useful tool for clinicians to determine whether neoadjuvant therapy is effective or treatment strategy is approximate, and this study may provide the groundwork for further exploration into the application of MPM in a clinical setting.
24.Robust Edge-Stop Functions for Edge-Based Active Contour Models in Medical Image Segmentation
ABSTRACT
Edge-based active contour models are effective in segmenting images with intensity inhomogeneity but often fail when applied to images containing poorly defined boundaries, such as in medical images. Traditional edge-stop functions (ESFs) utilize only gradient information, which fails to stop contour evolution at such boundaries because of the small gradient magnitudes. To address this problem, we propose a framework to construct a group of ESFs for edge-based active contour models to segment objects with poorly defined boundaries. In our framework, which incorporates gradient information as well as probability scores from a standard classifier, the ESF can be constructed from any classification algorithm and applied to any edge-based model using a level set method. Experiments on medical images using the distance regularized level set for edge-based active contour models as well as the k-nearest neighbors and the support vector machine confirm the effectiveness of the proposed approach
25.A Combined KFDA Method and GUI Realized for Face Recognition
ABSTRACT
Traditional face recognition methods such as Principal Components Analysis(PCA), Independent Component Analysis(ICA) and Linear Discriminant Analysis(LDA) are linear discriminant methods, but in the real situation, a lot of problems can’t be linear discriminated; therefore, researchers proposed face recognition method based on kernel techniques which can transform the nonlinear problem of inputting space into the linear problem of high dimensional space. In this paper, we propose a recognition method based on kernel function which combines kernel Fisher Discriminant Analysis(KFDA) with kernel Principle Components Analysis(KPCA) and use typical ORL(Olivetti Research Laboratory) face database as our experimental database. There are four key steps: constructing feature subspace, image projection, feature extraction and image recognition. We found that the recognition accuracy has been greatly improved by using nonlinear identification method and combined feature extraction methods. We use MATLAB software as the platform, and use the GUI to demonstrate the process of face recognition in order to achieving human-computer interaction and making the process and result more intuitive
26.A Cost-Effective Minutiae Disk Code For Fingerprint Recognition And Its Implementation
ABSTRACT
Fingerprint is one of the unique biometric features for the application of identity security. Minutiae cylinder code (MCC) constructs a cylinder for each minutia to record the contribution of the neighbor minutiae, which has great performance on fingerprint recognition. However, the computation time of the MCC is high. Therefore, we proposed a new disk structure to encode the local structure for each minutia. The proposed minutiae disk code (MDC) clearly illustrates the distribution of the neighbor minutiae and encodes the neighbor minutiae more efficiently by having 280.08× speed faster than the MCC encoding part on Matlab platform. The proposed MDC approach has 96.81% recognition rate on FVC2000 and FVC2002 datasets. The hardware implementation can achieve the operating frequency at 111MHz, which can process 1234 fingerprint images per second with the image size of 255 χ 255 and the maximum of 64 minutiae, under TSMC 90nm CMOS technology. The hardware implementation has 141.27× speed faster than the MCC method.
.
27.A Hands-on Application-Based Tool for STEM Students to Understand Differentiation
ABSTRACT
The main goal of this project is to illustrate to college students in science, technology, engineering, and mathematics (STEM) fields some fundamental concepts in calculus. A high-level technical computing language – MATLAB, is the core platform used in the construction of this project. A graphical user interface (GUI) is designed to interactively explain the intuition behind a key mathematical concept: differentiation. The GUI demonstrates how a derivative operation (as a form of kernel) can be applied on one-dimensional (1D) and two-dimensional (2D) images (as a form of vector). The user can interactively select from a set of predetermined operations and images in order to show how the selected kernel operates on the corresponding image. Such interactive tools in calculus courses are of great importance and need, especially for STEM students who seek an intuitive and visual understanding of mathematical notions that are often presented to them as abstract concepts. In addition to students, instructors can greatly benefit from using such tools to elucidate the use of fundamental concepts in mathematics in a real world context.
28.Rotation Invariant Texture Description Using Symmetric Dense Microblock Difference
ABSTRACT
This paper is devoted to the problem of rotation invariant texture classification. Novel rotation invariant feature, symmetric dense microblock difference (SDMD), is proposed which captures the information at different orientations and scales. N -fold symmetry is introduced in the feature design configuration, while retaining the random structure that provides discriminative power. The symmetry is utilized to achieve a rotation invariance. The SDMD is extracted using an image pyramid and encoded by the Fisher vector approach resulting in a descriptor which captures variations at different resolutions without increasing the dimensionality. The proposed image representation is combined with the linear SVM classifier. Extensive experiments are conducted on four texture data sets [Brodatz, UMD, UIUC, and Flickr material data set (FMD)] using standard protocols. The results demonstrate that our approach outperforms the state of the art in texture classification. The MATLAB code is made available.
29.A Novel Image Quality Assessment With Globally and Locally Consilient Visual Quality Perception
ABSTRACT
Computational models for image quality assessment (IQA) have been developed by exploring effective features that are consistent with the characteristics of a human visual system (HVS) for visual quality perception. In this paper, we first reveal that many existing features used in computational IQA methods can hardly characterize visual quality perception for local image characteristics and various distortion types. To solve this problem, we propose a new IQA method, called the structural contrast-quality index (SC-QI), by adopting a structural contrast index (SCI), which can well characterize local and global visual quality perceptions for various image characteristics with structural-distortion types. In addition to SCI, we devise some other perceptually important features for our SC-QI that can effectively reflect the characteristics of HVS for contrast sensitivity and chrominance component variation. Furthermore, we develop a modified SC-QI, called structural contrast distortion metric (SC-DM), which inherits desirable mathematical properties of valid distance metricability and quasi-convexity. So, it can effectively be used as a distance metric for image quality optimization problems. Extensive experimental results show that both SC-QI and SC-DM can very well characterize the HVS’s properties of visual quality perception for local image characteristics and various distortion types, which is a distinctive merit of our methods compared with other IQA methods. As a result, both SC-QI and SC-DM have better performances with a strong consilience of global and local visual quality perception as well as with much lower computation complexity, compared with the state-of-the-art IQA methods.
30.A DCT-based Total JND Profile for Spatio-Temporal and Foveated Masking Effects
ABSTRACT
In image and video processing fields, DCT-based just noticeable difference (JND) profiles have effectively been utilized to remove perceptual redundancies in pictures for compression. In this paper, we solve two problems that are often intrinsic to the conventional DCT-based JND profiles: (i) no foveated masking (FM) JND model has been incorporated in modeling the DCT-based JND profiles; and (ii) the conventional temporal masking (TM) JND models assume that all moving objects in frames can be well tracked by the eyes and that they are projected on the fovea regions of the eyes, which is not a realistic assumption and may result in poor estimation of JND values for untracked moving objects (or image regions). To solve these two problems, we first propose a generalized JND model for joint effects between TM and FM effects. With this model, called the temporal-foveated masking (TFM) JND model, JND thresholds for any tracked/untracked and moving/still image regions can be elaborately estimated. Finally, the TFM-JND model is incorporated into a total DCT-based JND profile with a spatial contrast sensitivity function, luminance masking, and contrast masking JND models. In addition, we propose a JND adjustment method for our total JND profile to avoid overestimation of JND values for image blocks of fixed sizes with various image characteristics. To validate the effectiveness of the total JND profile, an experiment involving a subjective distortionvisibility assessment has been conducted. The experiment results show that the proposed total DCT-based JND profile yields significant performance improvement with much higher capability of distortion concealment (average 5.6 dB lower PSNR) compared to state-of-the-art JND profiles
31.PiCode: a New Picture-Embedding 2D Barcode
ABSTRACT
Nowadays, 2D barcodes have been widely used as an interface to connect potential customers and advertisement contents. However, the appearance of a conventional 2D barcode pattern is often too obtrusive for integrating into an aesthetically designed advertisement. Besides, no human readable information is provided before the barcode is successfully decoded. This paper proposes a new picture-embedding 2D barcode, called PiCode, which mitigates these two limitations by equipping a scannable 2D barcode with a picturesque appearance. PiCode is designed with careful considerations on both the perceptual quality of the embedded image and the decoding robustness of the encoded message. Comparisons with the existing beautified 2D barcodes show that PiCode achieves one of the best perceptual qualities for the embedded image, and maintains a better tradeoff between image quality and decoding robustness in various application conditions. PiCode has been implemented in the MATLAB on a PC and some key building blocks have also been ported to Android and iOS platforms. Its practicality for real-world applications has been successfully demonstrated.
32.OCR Based Feature Extraction and Template Matching Algorithms for Qatari Number Plate
ABSTRACT
There are several algorithms and methods that could be applied to perform the character recognition stage of an automatic number plate recognition system; however, the constraints of having a high recognition rate and real-time processing should be taken into consideration. In this paper four algorithms applied to Qatari number plates are presented and compared. The proposed algorithms are based on feature extraction (vector crossing, zoning, combined zoning and vector crossing) and template matching techniques. All four proposed algorithms have been implemented and tested using MATLAB. A total of 2790 Qatari binary character images were used to test the algorithms. Template matching based algorithm showed the highest recognition rate of 99.5% with an average time of 1.95 ms per character.
33.HD Qatari ANPR System
ABSTRACT
Recently, Automatic Number Plate Recognition (ANPR) systems have become widely used in safety, security, and commercial aspects. The whole ANPR system is based on three main stages: Number Plate Localization (NPL), Character Segmentation (CS), and Optical Character Recognition (OCR). In recent years, to provide better recognition rate, High Definition (HD) cameras have started to be used. However, most known techniques for standard definition are not suitable for real-time HD image processing due to the computationally intensive cost of localizing the number plate. In this paper, algorithms to implement the three main stages of a high definition ANPR system for Qatari number plates are presented. The algorithms have been tested using MATLAB and two databases as a proof of concept. Implementation results have shown that the system is able to process one HD image in 61 ms, with an accuracy of 98.0% in NPL, 99.75% per character in CS, and 99.5% in OCR.
34.Template Matching of Aerial Images using GPU
ABSTRACT
During the last decade, processor architectures have emerged with hundreds and thousands of high speed processing cores in a single chip. These cores can work in parallel to share a work load for faster execution. This paper presents performance evaluations on such multicore and many-core devices by mapping a computationally expensive correlation kernel of a template matching process using various programming models. The work builds a base performance case by a sequential mapping of the algorithm on an Intel processor. In the second step, the performance of the algorithm is enhanced by parallel mapping of the kernel on a shared memory multicore machine using OpenMP programming model. Finally, the Normalized Cross-Correlation (NCC) kernel is scaled to map on a many-core K20 GPU using CUDA programming model. In all steps, the correctness of the implementation of algorithm is taken care by comparing computed data with reference results from a high level implementation in MATLAB. The performance results are presented with various optimization techniques for MATLAB, Sequential, OpenMP and CUDA based implementations. The results show that GPU based implementation achieves 32x and 5x speed-ups respectively to the base case and multicore implementations respectively. Moreover, using inter-block sub-sampling on an 8-bit 4000×4000 reference gray-scale image achieves the execution time upto 2.8sec with an error growth less than 20% for the selected templates of size 96×96.
35.Analysis of Adaptive Filter and ICA for Noise Cancellation from a Video Frame
ABSTRACT
Noise cancellation algorithms have been frequently applied in many fields including image/video processing. Adaptive noise cancellation algorithms exploit the correlation property of noise and remove the noise from the input signal more effectively than non-adaptive algorithms. In this paper different noise cancellation techniques are applied to de-noise a video frame. Three different variants of gradient based adaptive filtering algorithms and independent component analysis (ICA) procedure are implemented and compared on the basis of signal to noise ratio (SNR) and computational time. The common algorithms used in adaptive filters are least mean square (LMS), normalized least means square (NLMS), and recursive least mean square (RLS). The simulation results demonstrates that NLMS algorithm is computationally efficient but cannot handle impulsive noise whereas LMS and RLS can perform better for long duration noise signals. The comparative analysis of adaptive filtering algorithms and ICA shows that ICA can perform better then all three iterative gradient based algorithms because of its non-iterative nature. For testing and simulations, three variants of white Gaussian noise (WGN) are used to corrupt the video frame.
36. Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval
ABSTRACT
Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for being implemented into operational biophysical variable retrieval schemes. However, they face difficulties in coping with large training data sets. With the increasing amount of optical remote sensing data made available for analysis and the possibility of using a large amount of simulated data from radiative transfer models (RTMs) to train kernel MLRAs, efficient data reduction techniques will need to be implemented. Active learning (AL) methods enable to select the most informative samples in a data set. This letter introduces six AL methods for achieving optimized biophysical variable estimation with a manageable training data set, and their implementation into a Matlab-based MLRA toolbox for semiautomatic use. The AL methods were analyzed on their efficiency of improving the estimation accuracy of the leaf area index and chlorophyll content based on PROSAIL simulations. Each of the implemented methods outperformed random sampling, improving retrieval accuracy with lower sampling rates. Practically, AL methods open opportunities to feed advanced MLRAs with RTM-generated training data for the development of operational retrieval models.
37. Underwater Depth Estimation and Image Restoration Based on Single Images.
ABSTRACT
In underwater environments, the scattering and absorption phenomena affect the propagation of light, degrading the quality of captured images. In this work, the authors present a method based on a physical model of light propagation that takes into account the most significant effects to image degradation: absorption, scattering, and backscattering. The proposed method uses statistical priors to restore the visual quality of the images acquired in typical underwater scenarios.
38. Spectral–Spatial Sparse Subspace Clustering for Hyper spectral Remote Sensing Images
ABSTRACT
Clustering for hyperspectral images (HSIs) is a very challenging task due to its inherent complexity. In this paper, we propose a novel spectral-spatial sparse subspace clustering S4C algorithm for hyperspectral remote sensing images. First, by treating each kind of land-cover class as a subspace, we introduce the sparse subspace clustering (SSC) algorithm to HSIs. Then, considering the spectral and spatial properties of HSIs, the high spectral correlation and rich spatial information of the HSIs are taken into consideration in the SSC model to obtain a more accurate coefficient matrix, which is used to build the adjacent matrix. Finally, spectral clustering is applied to the adjacent matrix to obtain the final clustering result. Several experiments were conducted to illustrate the performance of the proposed S4C algorithm.
39. Beyond colour Difference: Residual Interpolation for colour Image De-mosaicking
ABSTRACT
In this paper, we propose residual interpolation (RI) as an alternative to color difference interpolation, which is a widely accepted technique for color image demosaicking. Our proposed RI performs the interpolation in a residual domain, where the residuals are differences between observed and tentatively estimated pixel values. Our hypothesis for the RI is that if image interpolation is performed in a domain with a smaller Laplacian energy, its accuracy is improved. Based on the hypothesis, we estimate the tentative pixel values to minimize the Laplacian energy of the residuals. We incorporate the RI into the gradient-based threshold free algorithm, which is one of the state-of-the-art Bayer demosaicking algorithms. Experimental results demonstrate that our proposed demosaicking algorithm using the RI surpasses the state-of-the-art algorithms for the Kodak, the IMAX, and the beyond Kodak data sets.
40. Extracting Line Features in SAR Images through Image Edge Fields
ABSTRACT
Conventional line detection methods are mainly based on the binary edge map. This letter proposes a new line detection method that directly extracts line features from the image edge fields of the synthetic aperture radar (SAR) images. In the proposed method, the strength and direction of each field point are first obtained using a ratio-based edge filter. Then, the accumulation weight of the field point is jointly computed using its strength and direction. The direction of a field point on the line is essentially the orientation of the line. Furthermore, a field point on a strong line should be distinguished from a field point on a weak line. Thus, the accumulation weights of different field points are not equal. By summing up the accumulation weights, the straight lines in the SAR image space are directly converted into several local peaks in the parameter space. A sort-window peak detection method is proposed to suppress the spurious secondary peaks in the parameter space. The experimental results show that the proposed line detection method is robust to noise and has a good antiocclusion ability. The proposed method performs well in terms of true positive detection rate and detection accuracy for both synthetic and real-world images.
41. Sequence-to-Sequence Similarity-Based Filter for Image De-noising
ABSTRACT
Image denoising has been a well-studied problem for imaging systems, especially imaging sensors. Despite remarkable progress in the quality of denoising algorithms, persistent challenges remain for a wide class of general images. In this paper, we present a new concept of sequence-to-sequence similarity (SSS). This similarity measure is an efficient method to evaluate the content similarity for images, especially for edge information. The approach differs from the traditional image processing techniques, which rely on pixel and block similarity. Based on this new concept, we introduce a new SSS-based filter for image denoising. The new SSS-based filter utilizes the edge information in the corrupted image to address image denoising problems. We demonstrate the filter by incorporating it into a new SSS-based image denoising algorithm to remove Gaussian noise. Experiments are performed over synthetic and experimental data. The performance of our methodology is experimentally verified on a variety of images and Gaussian noise levels. The results demonstrate that the proposed method’s performance exceeds several current state-of-the-art works, which are evaluated both visually and quantitatively. The presented framework opens up new perspectives in the use of SSS methodologies for image processing applications to replace the traditional pixel-to-pixel similarity or block-to-block similarity.
42. Underwater Visual Computing: The Grand Challenge Just around the Corner
ABSTRACT
Visual computing technologies have traditionally been developed for conventional setups where air is the surrounding medium for the user, the display, and/or the camera. However, given mankind’s increasingly need to rely on the oceans to solve the problems of future generations (such as offshore oil and gas, renewable energies, and marine mineral resources), there is a growing need for mixed-reality applications for use in water. This article highlights the various research challenges when changing the medium from air to water, introduces the concept of underwater mixed environments, and presents recent developments in underwater visual computing applications.
43. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification
ABSTRACT
Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.
44. Artificial Neural Networks Applied to Image Steganography
ABSTRACT
This paper presents a technique for transmitting information efficiently and securely, hiding confidential messages on seemingly innocent messages using steganography. The insertion technique in the least significant bit is used to insert images into digital pictures or other secret watermark. Artificial Neural Networks are used in the process of withdrawal of encrypted information acting as keys that determine the existence of hidden information.
45. Robust Blur Kernel Estimation for License Plate Images from Fast Moving Vehicles
ABSTRACT
As the unique identification of a vehicle, license plate is a key clue to uncover over-speed vehicles or the ones involved in hit-and-run accidents. However, the snapshot of over-speed vehicle captured by surveillance camera is frequently blurred due to fast motion, which is even unrecognizable by human. Those observed plate images are usually in low resolution and suffer severe loss of edge information, which cast great challenge to existing blind deblurring methods. For license plate image blurring caused by fast motion, the blur kernel can be viewed as linear uniform convolution and parametrically modeled with angle and length. In this paper, we propose a novel scheme based on sparse representation to identify the blur kernel. By analyzing the sparse representation coefficients of the recovered image, we determine the angle of the kernel based on the observation that the recovered image has the most sparse representation when the kernel angle corresponds to the genuine motion angle. Then, we estimate the length of the motion kernel with Radon transform in Fourier domain. Our scheme can well handle large motion blur even when the license plate is unrecognizable by human. We evaluate our approach on real-world images and compare with several popular state-of-the-art blind image deblurring algorithms. Experimental results demonstrate the superiority of our proposed approach in terms of effectiveness and robustness
46. Feature Extraction for Patch-Based Classification of Multispectral Earth Observation Images
ABSTRACT
Recently, various patch-based approaches have emerged for high and very high resolution multispectral image classification and indexing. This comes as a consequence of the most important particularity of multispectral data: objects are represented using several spectral bands that equally influence the classification process. In this letter, by using a patch-based approach, we are aiming at extracting descriptors that capture both spectral information and structural information. Using both the raw texture data and the high spectral resolution provided by the latest sensors, we propose enhanced image descriptors based on Gabor, spectral histograms, spectral indices, and bag-of-words framework. This approach leads to a scene classification that outperforms the results obtained when employing the initial image features. Experimental results on a WorldView-2 scene and also on a test collection of tiles created using Sentinel 2 data are presented. A detailed assessment of speed and precision was provided in comparison with state-of-the-art techniques. The broad applicability is guaranteed as the performances obtained for the two selected data sets are comparable, facilitating the exploration of previous and newly lunched satellite missions.
47. A Diffusion and Clustering-Based Approach for Finding Coherent Motions and Understanding Crowd Scenes
ABSTRACT
This paper addresses the problem of detecting coherent motions in crowd scenes and presents its two applications in crowd scene understanding: semantic region detection and recurrent activity mining. It processes input motion fields (e.g., optical flow fields) and produces a coherent motion field named thermal energy field. The thermal energy field is able to capture both motion correlation among particles and the motion trends of individual particles, which are helpful to discover coherency among them. We further introduce a two-step clustering process to construct stable semantic regions from the extracted time-varying coherent motions. These semantic regions can be used to recognize pre-defined activities in crowd scenes. Finally, we introduce a cluster-and-merge process, which automatically discovers recurrent activities in crowd scenes by clustering and merging the extracted coherent motions. Experiments on various videos demonstrate the effectiveness of our approach.
48. Unsupervised Co-Segmentation for Indefinite Number of Common Foreground Objects
ABSTRACT
Co-segmentation addresses the problem of simultaneously extracting the common targets appeared in multiple images. Multiple common targets involved object co-segmentation problem, which is very common in reality, has been a new research hotspot recently. In this paper, an unsupervised object co-segmentation method for indefinite number of common targets is proposed. This method overcomes the inherent limitation of traditional proposal selection-based methods for multiple common targets involved images while retaining their original advantages for objects extracting. For each image, the proposed multi-search strategy extracts each target individually and an adaptive decision criterion is raised to give each candidate a reliable judgment automatically, i.e., target or non-target. The comparison experiments conducted on public data sets iCoseg, MSRC, and a more challenging data set Coseg-INCT demonstrate the superior performance of the proposed method.
49. Hierarchical Discriminative Feature Learning for Hyper spectral Image Classification
ABSTRACT
Building effective image representations from hyperspectral data helps to improve the performance for classification. In this letter, we develop a hierarchical discriminative feature learning algorithm for hyperspectral image classification, which is a deformation of the spatial-pyramid-matching model based on the sparse codes learned from the discriminative dictionary in each layer of a two-layer hierarchical scheme. The pooling features achieved by the proposed method are more robust and discriminative for the classification. We evaluate the proposed method on two hyperspectral data sets: Indiana Pines and Salinas scene. The results show our method possessing state-of-the-art classification accuracy.
50. Global and Local Saliency Analysis for the Extraction of Residential Areas in High-Spatial-Resolution Remote Sensing Image
ABSTRACT
Extraction of residential areas plays an important role in remote sensing image processing. Extracted results can be applied to various scenarios, including disaster assessment, urban expansion, and environmental change research. Quality residential areas extracted from a remote sensing image must meet three requirements: well-defined boundaries, uniformly highlighted residential area, and no background redundancy in residential areas. Driven by these requirements, this study proposes a global and local saliency analysis model (GLSA) for the extraction of residential areas in high-spatial-resolution remote sensing images. In the proposed model, a global saliency map based on quaternion Fourier transform (QFT) and a global saliency map based on adaptive directional enhancement lifting wavelet transform (ADE-LWT) are generated along with a local saliency map, all of which are fused into a main saliency map based on complementarities. In order to analyze the correlation among spectrums in the remote sensing image, the phase spectrum information of QFT is used on the multispectral images for producing a global saliency map. To acquire the texture and edge features of different scales and orientations, the coefficients acquired by ADE-LWT are used to construct another global saliency map. To discard redundant backgrounds, the amplitude spectrum of the Fourier transform and the spatial relations among patches are introduced into the panchromatic image to generate the local saliency map. Experimental results indicate that the GLSA model can better define the boundaries of residential areas and achieve complete residential areas than current methods. Furthermore, the GLSA model can prevent redundant backgrounds in residential areas and thus acquire more accurate residential areas.
51. A Feature Learning and Object Recognition Framework for Underwater Fish Images
ABSTRACT
Live fish recognition is one of the most crucial elements of fisheries survey applications where the vast amount of data is rapidly acquired. Different from general scenarios, challenges to underwater image recognition are posted by poor image quality, uncontrolled objects and environment, and difficulty in acquiring representative samples. In addition, most existing feature extraction techniques are hindered from automation due to involving human supervision. Toward this end, we propose an underwater fish recognition framework that consists of a fully unsupervised feature learning technique and an error-resilient classifier. Object parts are initialized based on saliency and relaxation labeling to match object parts correctly. A non-rigid part model is then learned based on fitness, separation, and discrimination criteria. For the classifier, an unsupervised clustering approach generates a binary class hierarchy, where each node is a classifier. To exploit information from ambiguous images, the notion of partial classification is introduced to assign coarse labels by optimizing the benefit of indecision made by the classifier. Experiments show that the proposed framework achieves high accuracy on both public and self-collected underwater fish images with high uncertainty and class imbalance.
52. Statistics of Natural Stochastic Textures and Their Application in Image De-noising
ABSTRACT
Natural stochastic textures (NSTs), characterized by their fine details, are prone to corruption by artifacts, introduced during the image acquisition process by the combined effect of blur and noise. While many successful algorithms exist for image restoration and enhancement, the restoration of natural textures and textured images based on suitable statistical models has yet to be further improved. We examine the statistical properties of NST using three image databases. We show that the Gaussian distribution is suitable for many NST, while other natural textures can be properly represented by a model that separates the image into two layers; one of these layers contains the structural elements of smooth areas and edges, while the other contains the statistically Gaussian textural details. Based on these statistical properties, an algorithm for the denoising of natural images containing NST is proposed, using patch-based fractional Brownian motion model and regularization by means of anisotropic diffusion. It is illustrated that this algorithm successfully recovers both missing textural details and structural attributes that characterize natural images. The algorithm is compared with classical as well as the state-of-the-art denoising algorithms.
53. Large Polari metric SAR Data Semi-Supervised Classification with Spatial-Anchor Graph
ABSTRACT
Recently, graph-based semi-supervised classification (SSC) has attracted considerable attentions as it could enhance classification accuracy by utilizing only a few labeled samples and large numbers of unlabeled samples via graphs. However, the construction of graphs is time consuming especially for large-scale polarimetric synthetic aperture radar (PolSAR) data. Moreover, speckle noise in images remarkably degrades the accuracy of the constructed graph. To address these two issues, this paper proposes a novel spatial-anchor graph for large-scale PolSAR terrain classification. First, the PolSAR image is segmented to obtain homogeneous regions. The features of each pixel are weighted by that of the surrounding pixels from the homogeneous regions to reduce the influence of the speckle noise. Second, Wishart distance-based clustering is performed on the weighted features, and the cluster centers are computed and serve as initial anchors. Then, the label of each pixel is predicted by the label of its nearest anchors on the spatial-anchor graph which is constructed through solving an optimization problem. Experimental results on synthesized PolSAR data and real ones from different approaches show that the proposed method reduces the computational complexity to a linear time, and the graph combined with the spatial information suppresses the speckle noise and enhances the classification accuracy in comparison with state-of-the-art graph-based SSCs when only a small number of labeled samples are available.
54. Modeling, Measuring, and Compensating Color Weak Vision
ABSTRACT
We use methods from Riemann geometry to investigate transformations between the color spaces of color-normal and color-weak observers. The two main applications are the simulation of the perception of a color weak observer for a color-normal observer, and the compensation of color images in a way that a color-weak observer has approximately the same perception as a color-normal observer. The metrics in the color spaces of interest are characterized with the help of ellipsoids defined by the just-noticeable-differences between the colors which are measured with the help of color-matching experiments. The constructed mappings are the isometries of Riemann spaces that preserve the perceived color differences for both observers. Among the two approaches to build such an isometry, we introduce normal coordinates in Riemann spaces as a tool to construct a global color-weak compensation map. Compared with the previously used methods, this method is free from approximation errors due to local linearizations, and it avoids the problem of shifting locations of the origin of the local coordinate system. We analyze the variations of the Riemann metrics for different observers obtained from new color-matching experiments and describe three variations of the basic method. The performance of the methods is evaluated with the help of semantic differential tests.
55. SAR Image Registration Based on Multi feature Detection and Arborescence Network Matching
ABSTRACT
In this letter, a novel synthetic aperture radar (SAR) image registration method, including two operators for feature detection and arborescence network matching (ANM) for feature matching, is proposed. The two operators, namely, SAR scale-invariant feature transform (SIFT) and R-SIFT, can detect corner points and texture points in SAR images, respectively. This process has an advantage of preserving two types of feature information in SAR images simultaneously. The ANM algorithm has a two-stage process for finding matching pairs. The backbone network and the branch network are successively built. This ANM algorithm combines feature constraints with spatial relations among feature points and possesses a larger number of matching pairs and higher subpixel matching precision than the original version. Experimental results on various SAR images show that the proposed method provides superior performance than other approaches investigated.
56. Supervised Classification of Very High Resolution Optical Images Using Wavelet-Based Textural Features
ABSTRACT
In this paper, we explore the potentialities of using wavelet-based multivariate models for the classification of very high resolution optical images. A strategy is proposed to apply these models in a supervised classification framework. This strategy includes a content-based image retrieval analysis applied on a texture database prior to the classification in order to identify which multivariate model performs the best in the context of application. Once identified, the best models are further applied in a supervised classification procedure by extracting texture features from a learning database and from regions obtained by a presegmentation of the image to classify. The classification is then operated according to the decision rules of the chosen classifier. The use of the proposed strategy is illustrated in two real case applications using Pléiades panchromatic images: the detection of vineyards and the detection of cultivated oyster fields. In both cases, at least one of the tested multivariate models displays higher classification accuracies than gray-level cooccurrence matrix descriptors. Its high adaptability and the low number of parameters to be set are other advantages of the proposed approach.
57. Text-Attentional Convolutional Neural Network for Scene Text Detection
ABSTRACT
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.
58. Novelty-based Spatiotemporal Saliency Detection for Prediction of Gaze in Egocentric Video
ABSTRACT
The automated analysis of video captured from a first-person perspective has gained increased interest since the advent of marketed miniaturized wearable cameras. With this a person is taking visual measurements about the world in a sequence of fixations which contain relevant information about the most salient parts of the environment and the goals of the actor. We present a novel model for gaze prediction in egocentric video based on the spatiotemporal visual information captured from the wearer’s camera, specifically extended using a subjective function of surprise by means of motion memory, referring to the human aspect of visual attention. Spatiotemporal saliency detection is computed in a bioinspired framework using a superposition of superpixel- and contrast based conspicuity maps as well as an optical flow based motion saliency map. Motion is further processed into a motion novelty map that is constructed by a comparison between most recent motion information with an exponentially decreasing memory of motion information. The innovative motion novelty map is experienced to be able to provide a significant increase in the performance of gaze prediction. Experimental results are gained from egocentric videos using eye-tracking glasses in a natural shopping task and prove a 6.48% increase in the mean saliency at a fixation in terms of a measure of mimicking human attention.
59. Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images
ABSTRACT
Detection and classification of cell nuclei in histopathology images of cancerous tissue stained with the standard hematoxylin and eosin stain is a challenging task due to cellular heterogeneity. Deep learning approaches have been shown to produce encouraging results on histopathology images in various studies. In this paper, we propose a Spatially Constrained Convolutional Neural Network (SC-CNN) to perform nucleus detection. SC-CNN regresses the likelihood of a pixel being the center of a nucleus, where high probability values are spatially constrained to locate in the vicinity of the centers of nuclei. For classification of nuclei, we propose a novel Neighboring Ensemble Predictor (NEP) coupled with CNN to more accurately predict the class label of detected cell nuclei. The proposed approaches for detection and classification do not require segmentation of nuclei. We have evaluated them on a large dataset of colorectal adenocarcinoma images, consisting of more than 20,000 annotated nuclei belonging to four different classes. Our results show that the joint detection and classification of the proposed SC-CNN and NEP produces the highest average F1 score as compared to other recently published approaches. Prospectively, the proposed methods could offer benefit to pathology practice in terms of quantitative analysis of tissue constituents in whole-slide images, and potentially lead to a better understanding of cancer.
60. Combining Generative and Discriminative Representation Learning for Lung CT Analysis with Convolutional Restricted Boltzmann Machines
ABSTRACT
The choice of features greatly influences the performance of a tissue classification system. Despite this, many systems are built with standard, predefined filter banks that are not optimized for that particular application. Representation learning methods such as restricted Boltzmann machines may outperform these standard filter banks because they learn a feature description directly from the training data. Like many other representation learning methods, restricted Boltzmann machines are unsupervised and are trained with a generative learning objective; this allows them to learn representations from unlabeled data, but does not necessarily produce features that are optimal for classification. In this paper we propose the convolutional classification restricted Boltzmann machine, which combines a generative and a discriminative learning objective. This allows it to learn filters that are good both for describing the training data and for classification. We present experiments with feature learning for lung texture classification and airway detection in CT images. In both applications, a combination of learning objectives outperformed purely discriminative or generative learning, increasing, for instance, the lung tissue classification accuracy by 1 to 8 percentage points. This shows that discriminative learning can help an otherwise unsupervised feature learner to learn filters that are optimized for classification.
61. A Novel Approach for Improved Vehicular Positioning Using Cooperative Map Matching and Dynamic
ABSTRACT
In this paper, a novel approach for improving vehicular positioning is presented. This method is based on the cooperation of the vehicles by communicating their measured information about their position. This method consists of two steps. In the first step, we introduce our cooperative map matching method. This map matching method uses the V2V communication in a vehicular ad hoc network (VANET) to exchange global positioning system (GPS) information between vehicles. Having a precise road map, vehicles can apply the road constraints of other vehicles in their own map matching process and acquire a significant improvement in their positioning. After that, we have proposed the concept of a dynamic base station DGPS (DDGPS), which is used by vehicles in the second step to generate and broadcast the GPS pseudorange corrections that can be used by newly arrived vehicles to improve their positioning. The DDGPS is a decentralized cooperative method that aims to improve the GPS positioning by estimating and compensating the common error in GPS pseudorange measurements. It can be seen as an extension of DGPS where the base stations are not necessarily static with an exact known position. In the DDGPS method, the pseudorange corrections are estimated based on the receiver’s belief on its positioning and its uncertainty and then broadcasted to other GPS receivers. The performance of the proposed algorithm has been verified with simulations in several realistic scenarios.
62. Image De-noising Using Quadtree-Based Nonlocal Means With Locally Adaptive Principal Component Analysis
ABSTRACT
In this letter, we present an efficient image denoising method combining quadtree-based nonlocal means (NLM) and locally adaptive principal component analysis. It exploits nonlocal multiscale self-similarity better, by creating sub-patches of different sizes using quadtree decomposition on each patch. To achieve spatially uniform denoising, we propose a local noise variance estimator combined with denoiser based on locally adaptive principal component analysis. Experimental results demonstrate that our proposed method achieves very competitive denoising performance compared with state-of-the-art denoising methods, even obtaining better visual perception at high noise levels.
63. Fusion of Multispectral and Panchromatic Images Based on Morphological Operator
ABSTRACT
Nonlinear decomposition schemes constitute an alternative to classical approaches for facing the problem of data fusion. In this paper, we discuss the application of this methodology to a popular remote sensing application called pansharpening, which consists in the fusion of a low resolution multispectral image and a high-resolution panchromatic image. We design a complete pansharpening scheme based on the use of morphological half gradient operators and demonstrate the suitability of this algorithm through the comparison with the state-of-the-art approaches. Four data sets acquired by the Pleiades, Worldview-2, Ikonos, and Geoeye-1 satellites are employed for the performance assessment, testifying the effectiveness of the proposed approach in producing top-class images with a setting independent of the specific sensor.
64. Texture Classification Using Dense Micro-Block Difference
ABSTRACT
This paper is devoted to the problem of texture classification. Motivated by recent advancements in the field of compressive sensing and keypoints descriptors, a set of novel features called dense micro-block difference (DMD) is proposed. These features provide highly descriptive representation of image patches by densely capturing the granularities at multiple scales and orientations. Unlike most of the earlier work on local features, the DMD does not involve any quantization, thus retaining the complete information. We demonstrate that the DMD have dimensionality much lower than Scale Invariant Feature Transform (SIFT) and can be computed using integral image much faster than SIFT. The proposed features are encoded using the Fisher vector method to obtain an image descriptor, which considers high-order statistics. The proposed image representation is combined with the linear support vector machine classifier. Extensive experiments are conducted on five texture data sets (KTH-TIPS, UMD, KTH-TIPS-2a, Brodatz, and Curet) using standard protocols. The results demonstrate that our approach outperforms the state-of-the-art in texture classification.
65. Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images
ABSTRACT
Among brain tumors, gliomas are the most common and aggressive, leading to a very short life expectancy in their highest grade. Thus, treatment planning is a key stage to improve the quality of life of oncological patients. Magnetic resonance imaging (MRI) is a widely used imaging technique to assess these tumors, but the large amount of data produced by MRI prevents manual segmentation in a reasonable time, limiting the use of precise quantitative measurements in the clinical practice. So, automatic and reliable segmentation methods are required; however, the large spatial and structural variability among brain tumors make automatic segmentation a challenging problem. In this paper, we propose an automatic segmentation method based on Convolutional Neural Networks (CNN), exploring small 3 ×3 kernels. The use of small kernels allows designing a deeper architecture, besides having a positive effect against overfitting, given the fewer number of weights in the network. We also investigated the use of intensity normalization as a pre-processing step, which though not common in CNN-based segmentation methods, proved together with data augmentation to be very effective for brain tumor segmentation in MRI images. Our proposal was validated in the Brain Tumor Segmentation Challenge 2013 database (BRATS 2013), obtaining simultaneously the first position for the complete, core, and enhancing regions in Dice Similarity Coefficient metric (0.88, 0.83, 0.77) for the Challenge data set. Also, it obtained the overall first position by the online evaluation platform. We also participated in the on-site BRATS 2015 Challenge using the same model, obtaining the second place, with Dice Similarity Coefficient metric of 0.78, 0.65, and 0.75 for the complete, core, and enhancing regions, respectively.
66. Online Multi-Modal Distance Metric Learning with Application to Image Retrieval
ABSTRACT
Distance metric learning (DML) is an important technique to improve similarity search in content-based image retrieval. Despite being studied extensively, most existing DML approaches typically adopt a single-modal learning framework that learns the distance metric on either a single feature type or a combined feature space where multiple types of features are simply concatenated. Such single-modal DML methods suffer from some critical limitations: (i) some type of features may significantly dominate the others in the DML task due to diverse feature representations; and (ii) learning a distance metric on the combined high-dimensional feature space can be extremely time-consuming using the naive feature concatenation approach. To address these limitations, in this paper, we investigate a novel scheme of online multi-modal distance metric learning (OMDML), which explores a unified two-level online learning scheme: (i) it learns to optimize a distance metric on each individual feature space; and (ii) then it learns to find the optimal combination of diverse types of features. To further reduce the expensive cost of DML on high-dimensional feature space, we propose a low-rank OMDML algorithm which not only significantly reduces the computational cost but also retains highly competing or even better learning accuracy. We conduct extensive experiments to evaluate the performance of the proposed algorithms for multi-modal image retrieval, in which encouraging results validate the effectiveness of the proposed technique.
67. Stereoscopic Visual Attention Guided Seam Carving for Stereoscopic Image Retargeting
ABSTRACT
Stereoscopic image retargeting plays an important role in adaptive 3D stereoscopic displays. It aims to fit displays with various resolutions while preserving visually salient content and geometric consistency. We propose a stereoscopic image retargeting method based on stereoscopic visual attention guided seam carving. Firstly, stereoscopic saliency map is generated by combining 2D saliency and depth saliency maps, and significant energy map is generated by considering binocular disparity binocular and binocular just-noticeable-difference (BJND). Then, seam selection is applied to the left image based on stereoscopic saliency and energy maps, and seam replacement is performed for the occluded regions to prevent the geometry inconsistency. Finally, according to the matched left and right seams, the retargeted stereoscopic image is generated. In the experiments, subjective and objective analysis on three stereoscopic image databases shows that the proposed approach produces better seam carving results than the related existing methods.
68. Exploiting Perceptual Anchoring for Color Image Enhancement
ABSTRACT
The preservation of image quality under various display conditions becomes more and more important in the multimedia era. A considerable amount of effort has been devoted to compensating the quality degradation caused by dim LCD backlight for mobile devices and desktop monitors. However, most previous enhancement methods for backlight-scaled images only consider the luminance component and overlook the impact of color appearance on image quality. In this paper, we propose a fast and elegant method that exploits the anchoring property of human visual system to preserve the color appearance of backlight-scaled images as much as possible. Our approach is distinguished from previous ones in many aspects. First, it has a sound theoretical basis. Second, it takes the luminance and chrominance components into account in an integral manner. Third, it has low complexity and can process 720p high-definition videos at 35 frames per second without flicker. The superior performance of the proposed method is verified through psychophysical tests.
69. Word-Hunt: A LSB Steganography Method with Low Expected Number of Modifications per Pixel
ABSTRACT
Least Significant Bit (LSB) steganography is a well-known technique which operates in the spatial domain of digital images. In this paper, the LSB Word-Hunt (LSB WH) is presented. It is a novel LSB approach inspired by the word-hunt puzzle. The main focus of LSB WH is to reduce the Expected Number of Modifications per Pixel (ENMPP) when compared to other methods in the literature. The results show that LSB WH has an ENMPP around 0.315, for natural images with high entropy on the second and the third least significant bits. Results also show that the new method is robust to the statistical chi-square attack.
70. The Relative Impact of Ghosting and Noise on the Perceived Quality of MR Images
ABSTRACT
Magnetic resonance (MR) imaging is vulnerable to a variety of artifacts, which potentially degrade the perceived quality of MR images and, consequently, may cause inefficient and/or inaccurate diagnosis. In general, these artifacts can be classified as structured or unstructured depending on the correlation of the artifact with the original content. In addition, the artifact can be white or colored depending on the flatness of the frequency spectrum of the artifact. In current MR imaging applications, design choices allow one type of artifact to be traded off with another type of artifact. Hence, to support these design choices, the relative impact of structured versus unstructured or colored versus white artifacts on perceived image quality needs to be known. To this end, we conducted two subjective experiments. Clinical application specialists rated the quality of MR images, distorted with different types of artifacts at various levels of degradation. The results demonstrate that unstructured artifacts deteriorate quality less than structured artifacts, while coloredartifacts preserve quality better than white artifacts.
71. Fast Appearance Modeling for Automatic Primary Video Object Segmentation
ABSTRACT
Automatic segmentation of the primary object in a video clip is a challenging problem as there is no prior knowledge of the primary object. Most existing techniques thus adapt an iterative approach for foreground and background appearance modeling, i.e., fix the appearance model while optimizing the segmentation and fix the segmentation while optimizing the appearance model. However, these approaches may rely on good initialization and can be easily trapped in local optimal. In addition, they are usually time consuming for analyzing videos. To address these limitations, we propose a novel and efficient appearance modeling technique for automatic primary video object segmentation in the Markov random field (MRF) framework. It embeds the appearance constraint as auxiliary nodes and edges in the MRF structure, and can optimize both the segmentation and appearance model parameters simultaneously in one graph cut. The extensive experimental evaluations validate the superiority of the proposed approach over the state-of-the-art methods, in both efficiency and effectiveness.
72. Image Segmentation Using Parametric Contours with Free Endpoints
ABSTRACT
In this paper, we introduce a novel approach for active contours with free endpoints. A scheme for image segmentation is presented based on a discrete version of the Mumford-Shah functional where the contours can be both closed and open curves. Additional to a flow of the curves in normal direction, evolution laws for the tangential flow of the endpoints are derived. Using a parametric approach to describe the evolving contours together with an edge-preserving denoising, we obtain a fast method for image segmentation and restoration. The analytical and numerical schemes are presented followed by numerical experiments with artificial test images and with a real medical image.
73. Visual Face Recognition Using Bag of Dense Derivative Depth Patterns
ABSTRACT
A novel biometric face recognition algorithm using depth cameras is proposed. The key contribution is the design of a novel and highly discriminative face image descriptor called bag of dense derivative depth patterns (Bag-D3P). This descriptor is composed of four different stages that fully exploit the characteristics of depth information: 1) dense spatial derivatives to encode the 3-D local structure; 2) face-adaptive quantization of the previous derivatives; 3) multibag of words that creates a compact vector description from the quantized derivatives; and 4) spatial block division to add global spatial information. The proposed system can recognize people faces from a wide range of poses, not only frontal ones, increasing its applicability to real situations. Last, a new face database of high-resolution depth images has been created and made it public for evaluation purposes.
74. Learning of Multimodal Representations with Random Walks on the Click Graph
ABSTRACT
In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users’ searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.
75. Ontology-Based Semantic Image Segmentation Using Mixture Models and Multiple CRFs
ABSTRACT
Semantic image segmentation is a fundamental yet challenging problem, which can be viewed as an extension of the conventional object detection with close relation to image segmentation and classification. It aims to partition images into non-overlapping regions that are assigned predefined semantic labels. Most of the existing approaches utilize and integrate low-level local features and high-level contextual cues, which are fed into an inference framework such as, the conditional random field (CRF). However, the lack of meaning in the primitives (i.e., pixels or superpixels) and the cues provides low discriminatory capabilities, since they are rarely object-consistent. Moreover, blind combinations of heterogeneous features and contextual cues exploitation through limited neighborhood relations in the CRFs tend to degrade the labeling performance. This paper proposes an ontology-based semantic image segmentation (OBSIS) approach that jointly models image segmentation and object detection. In particular, a Dirichlet process mixture model transforms the low-level visual space into an intermediate semantic space, which drastically reduces the feature dimensionality. These features are then individually weighed and independently learned within the context, using multiple CRFs. The segmentation of images into object parts is hence reduced to a classification task, where object inference is passed to an ontology model. This model resembles the way by which humans understand the images through the combination of different cues, context models, and rule-based learning of the ontologies. Experimental evaluations using the MSRC-21 and PASCAL VOC’2010 data sets show promising results.
76. Robust Texture Image Representation by Scale Selective Local Binary Patterns
ABSTRACT
Local binary pattern (LBP) has successfully been used in computer vision and pattern recognition applications, such as texture recognition. It could effectively address grayscale and rotation variation. However, it failed to get desirable performance for texture classification with scale transformation. In this paper, a new method based on dominant LBP in scale space is proposed to address scale variation for texture classification. First, a scale space of a texture image is derived by a Gaussian filter. Then, a histogram of pre-learned dominant LBPs is built for each image in the scale space. Finally, for each pattern, the maximal frequency among different scales is considered as the scale invariant feature. Extensive experiments on five public texture databases (University of Illinois at UrbanaChampaign, Columbia Utrecht Database, KungligaTekniskaHögskolan-Textures under varying Illumination, Pose and Scale, University of Maryland, and Amsterdam Library of Textures) validate the efficiency of the proposed feature extraction scheme. Coupled with the nearest subspace classifier, the proposed method could yield competitive results, which are 99.36%, 99.51%, 99.39%, 99.46%, and 99.71% for UIUC, CUReT, KTH-TIPS, UMD, and ALOT, respectively. Meanwhile, the proposed method inherits simple and efficient merits of LBP, for example, it could extract scale-robust feature for a 200 × 200 image within 0.24 s, which is applicable for many real-time applications.
77. Predicting Vascular Plant Richness in a Heterogeneous Wetland Using Spectral and Textural Features and a Random Forest Algorithm
ABSTRACT
A method to predict vascular plant richness using spectral and textural variables in a heterogeneous wetland is presented. Plant richness was measured at 44 sampling plots in a 16-ha anthropogenic peatland. Several spectral indices, first-order statistics (median and standard deviation), and second-order statistics [metrics of a gray-level co-occurrence matrix (GLCM)] were extracted from a Landsat 8 Operational Land Imager image and a Pleiades 1B image. We selected the most important variables for predicting richness using recursive feature elimination and then built a model using random forest regression. The final model was based on only two textural variables obtained from the GLCM and derived from the Landsat 8 image. An accurate predictive capability was reported (R2 = 0.6; RMSE = 1.99 species), highlighting the possibility of obtaining parsimonious models using textural variables. In addition, the results showed that the mid-resolution Landsat 8 image provided better predictors of richness than the high-resolution Pleiades image. This is the first study to generate a model for plant richness in a wetland ecosystem.
78. Analyzing the Effect of JPEG Compression on Local Variance of Image Intensity
ABSTRACT
The local variance of image intensity is a typical measure of image smoothness. It has been extensively used, for example, to measure the visual saliency or to adjust the filtering strength in image processing and analysis. However, to the best of our knowledge, no analytical work has been reported about the effect of JPEG compression on image local variance. In this paper, a theoretical analysis on the variation of local variance caused by JPEG compression is presented. First, the expectation of intensity variance of 8×8 non-overlapping blocks in a JPEG image is derived. The expectation is determined by the Laplacian parameters of the discrete cosine transform coefficient distributions of the original image and the quantization step sizes used in the JPEG compression. Second, some interesting properties that describe the behavior of the local variance under different degrees of JPEG compression are discussed. Finally, both the simulation and the experiments are performed to verify our derivation and discussion. The theoretical analysis presented in this paper provides some new insights into the behavior of local variance under JPEG compression. Moreover, it has the potential to be used in some areas of image processing and analysis, such as image enhancement, image quality assessment, and image filtering.
79. Rank-Based Image Watermarking Method with High Embedding Capacity and Robustness
ABSTRACT
This paper presents a novel rank-basedmethod for imagewatermarking. In the watermark embedding process, the host image is divided into blocks, followed by the 2-D discrete cosine transform (DCT). For each image block, a secret key is employed to randomly select a set of DCT coefficients suitable for watermark embedding. Watermark bits are inserted into an image block by modifying the set of DCT coefficients using a rank-basedembedding rule. In the watermark detection process, the corresponding detection matrices are formed from the received image using the secret key. Afterward, the watermark bits are extracted by checking the ranks of the detection matrices. Since the proposed watermarkingmethod only uses two DCT coefficients to hide one watermark bit, it can achieve very highembeddingcapacity. Moreover, our method is free of host signal interference. This desired feature and the usage of an error buffer in watermark embedding result in highrobustness against attacks. Theoretical analysis and experimental results demonstrate the effectiveness of the proposed method.
80. Stroke lets: A Learned Multi-Scale Mid-Level Representation for Scene Text Recognition
ABSTRACT
In this paper, we are concerned with the problem of automatic scene text recognition, which involves localizing and reading characters in natural images. We investigate this problem from the perspective of representation and propose a novel multi-scale representation, which leads to accurate, robust character identification and recognition. This representation consists of a set of mid-level primitives, termed strokelets, which capture the underlying substructures of characters at different granularities. The Strokelets possess four distinctive advantages: 1) usability: automatically learned from character level annotations; 2) robustness: insensitive to interference factors; 3) generality: applicable to variant languages; and 4) expressivity: effective at describing characters. Extensive experiments on standard benchmarks verify the advantages of the strokelets and demonstrate the effectiveness of the text recognition algorithm built upon the strokelets. Moreover, we show the method to incorporate the strokelets to improve the performance of scene text detection.