Deep learning and machine learning neural network approaches for multi class leather texture defect classification and segmentation

Modern leather industries are focused on producing high quality leather products for sustaining the market competitiveness. However, various leather defects are introduced during various stages of manufacturing process such as material handling, tanning and dyeing. Manual inspection of leather surfaces is subjective and inconsistent in nature; hence machine vision systems have been widely adopted for the automated inspection of leather defects. It is necessary develop suitable image processing algorithms for localize leather defects such as folding marks, growth marks, grain off, loose grain, and pinhole due to the ambiguous texture pattern and tiny nature in the localized regions of the leather. This paper presents deep learning neural network-based approach for automatic localization and classification of leather defects using a machine vision system. In this work, popular convolutional neural networks are trained using leather images of different leather defects and a class activation mapping technique is followed to locate the region of interest for the class of leather defect. Convolution neural networks such as Google net, Squeeze-net, RestNet are found to provide better accuracy of classification as compared with the state-of-the-art neural network architectures and the results are presented.


Introduction
Modern leather manufacturers and designers have given major focus on aesthetic perception, visual appearance and hand feel of leather garments as it affects the purchasing decision. However, various defects are introduced in the leather surface during the pre-tanning and post tanning processes in the leather industries. Hence, the identification and classification of leather defects is an essential process for maintaining the quality of finished products. As the manual inspection methods are slow, error-prone, and labor-intensive, machine vision-based automated inspection techniques are widely adopted for improving the productivity of the leather inspection process [1]. Due to the ambiguous texture pattern and tiny size of the defect, it is difficult to distinguish the localized defect and the background in the leather images. Hence, there is a need for developing a suitable image processing approach for the improved classification and perception of leather defects.
Various image processing techniques were proposed by many researchers for leather grading, defect identification, and classification. Quality inspection for grading is an important step in assessing the usable area of leathers. Each piece of leather is graded based on its effective cutting value, which is decided to take into consideration the number, size, and location of surface defects [2]. Grayscale image processing techniques using thresholding and morphological operations are applied for defect detection applications [3]. A histogram-based identification method is proposed for detecting defective leather images [4]. Edge detection along with morphological operations is applied to the leather images for segmenting the defect locations in the leather images [5]. Texture analysis technique using wavelet transform provides a collective spatial analysis of local pixel regions for leather defect detection [6]. A multi-level thresholding algorithm with the texture feature extraction method is proposed to segment defective and non-defective regions of leather for objectively quantification of the leather surface defects [7]. Sobral et al. introduced a new wavelet-based method using optimized filter banks for leather defect detection [8]. An optimization approach with a filtering process is applied for isolating the defective regions from the complex and not homogeneous background by analyzing their strongly oriented structure [9]. For defect detection and classification process, several image processing algorithms are employed to provide the quantitative descriptions of defective and non-defective leather images, various descriptors like first-order statistics, Contrast characteristics, Haralick descriptors, Fourier and Cosine transform, Hu moments with information about the intensity, local binary patterns, Gabor features are extracted to locate defect's positions on the leather surface [10]. Haralick features are derived from graylevel co-occurrence matrix (GLCM) which extract the local patterns in the image and count their distribution Moganam and Sathia Seelan Journal of Leather Science and Engineering (2022) 4:7 across the entire image. They provide a good discriminative encoding of the textural and gradient-based information, in the form of feature values [11]. A texture analysis method using wavelet statistical features and wavelet co-occurrences matrix features such as entropy, energy, contrast, correlation, cluster prominence Standard Deviation, Mean, and local homogeneity is proposed for leather defect classification [12]. Color-based models and Co-occurrence matrix-based texture analysis is reported for defect detection in raw leather [13]. Though the digital image processing approaches are applied for leather defect inspection applications, the accuracy classification is limited due to the presence of noise.
Recently, machine learning and deep learning methods have gained attention for image classification, detection, and segmentation applications. Kwak et al. proposed a three-stage sequential decision tree for the classification of defects such as lines, holes, stains, wears, and knots [14]. Viana et al. presented an empirical evaluation of support vector machine against AdaBoost and MLP, for solving the leather defect classification problem [15]. Supervised classification using the multi-layer perceptron (MLP), Decision trees (DT), SVM, Naïve Bayes, KNN, and Random forest (RF) classifiers were used to classify the defective and non-defective leather regions [16]. The neural network classifier is proposed by multilayer perceptron neural networks for recognizing leather defects like open cut, closed cut, and Fly Bite [17]. Amorim et al. presented linear discriminate analysis techniques for attribute reduction to four different classifiers such as C4.5, KNN, Naïve Bayes, and SVM classifiers for leather defect classification [18]. With recent advancements in computing and graphical processing units, deep learning neural networks are developed for automated inspection applications [19]. ResNet and VGG architectures based on convolution neural networks are capable of automated surface inspection and image classification applications using transfer learning [20]. Liong et al. proposed an integrated machine vision system using an artificial neural network and deep learning neural network for leather defect classification [21]. Region convolution neural network-based deep learning approach is used for defect detection and segmentation of defective regions in the leather image [22]. Based on the visual-tactile sense perception of the consumers, the back propagation neural network is developed for selecting the suitable leather materials to manufacture the user specified leather products [23].
It is found that there are many research works have been contributed for the leather defect detection and classification. The existing classification approaches has limited human perception using leather images and the accuracy of the classification is also found to be limited due to the vagueness, randomness, and size of the leather defects in the background texture pattern of the leather surface. In order to provide the improved accuracy of leather defect classification and human perception; this paper presents deep learning convolution neural network and machine learning classifier approaches for multi class classification and segmentation of leather defects. Classification performance of state-of-the-art deep learning and machine learning classifiers are compared for the leather data sets with texture defects and the results are presented in this paper.

Machine vision-based leather inspection system
A typical leather surface consists of different types of defects such as scars, growth marks, grain off, loose grain, pinholes, and folding marks. A machine vision system consisting of a high-resolution camera (BASLER acA4600), lighting system, computing system with an image processing software (MATLAB Version. 2020a) is established in the present work for identifying and classifying the leather defects and it is shown in Fig. 1.  A high-resolution camera with a resolution of 14 MP is used for acquiring the images of the leather surface with a resolution of 4608 × 3288 pixels. Table 1 shows the specifications of the camera used in the machine vision system.
Lighting plays an important role in vision inspection applications for illuminating the object of interest. In this work, a fiber optic illumination system is used for providing uniform illumination on the leather surface. The magnitude of luminance is measured using a Lux meter and it is controlled using a light controller knob.

Leather Image acquisition
A comprehensive data set of 3600 leather images is developed with different defects such as folding marks, grain off, pinhole, growth marks, loose grain, and non-defective leather surfaces. It's placed in open data science environment Kaggle for exploring suitable machine learning and deep learning-based image processing technique to classify the leather defects. Figure 2 shows the sample leather images of leather images with different leather defects.
It can be noted from Fig. 2a, b that folding marks and growth marks have a better visual perception of change in color and texture as compared to other defects. Also, the grain off, pinhole, and loose grain are found to have a finer texture pattern as identified in Fig. 2c-e respectively.

Leather texture defects
The leather images of different colours and defects as shown Fig. 2 are processed to obtain the grey scale intensity maps for analysing the texture variations due to different leather defects and the results are shown in Fig. 3. Folding marks, grain off, pinhole and loose grain has a coarse texture and they disturb the regular texture pattern which leads to many abrupt variations and peaks in the intensity of the pixels as shown in Fig. 3a-d respectively.
Growth mark and non-defective leather showed finer texture and uniform intensity variation is seen in Fig. 3e, f respectively. The visual perception of leather defects is limited by the ambiguous texture pattern and tiny nature of different leather defects. In order to the distinguish the type of leather defects, there is a need for developing suitable image processing algorithms for classification of different leather defects.

Deep learning neural network approach for classification and localization of leather defects
In order to reduce the error in detection and multi class classification of leather texture defects, this work presents deep learning convolutional neural network approach using state of the art convolutional neural network architectures like Alexnet, VGG-16, Google net, Squeeze Net, ResNet-50. Figure 4 shows the framework of the proposed approach for classifying and labeling leather images as non-defective leather, Loose grain, Grain off, Growth marks, Pinhole and Folding marks.
Using the developed deep learning neural network models, class activation map is generated for identifying the region of interest of the class of leather image. The details of network architecture, layers are explained in the following subsections.

Leather image Data Set preparation and preprocessing
A data set of 3600 leather images consisting of nondefective leather, Loose grain, Grain off, Growth marks, Pinhole and Folding marks is used for training neural networks. All the leather images are preprocessed using histogram equalization for addressing the illumination variations during image acquisition and resized to 227 × 227. For evaluating performance of segmentation of defective regions in the leather image, ground truth leather images with hand labeled defective regions are kept in the data set. A fivefold cross validation approach is followed in the present work in which the data set of 3600 image samples are split into 5 mutually exclusive and exhaustive folds of 720 leather images. Repeatedly, a fold is selected and designated as testing set, all the other remaining leather images (80% of the data) are considered as the training set.

Deep learning convolutional neural network architectures
In the present Standard, pre trained convolutional neural network architectures such as Alexnet, VGG-16, ResNet, Google net, Inception Net and Squeeze net are considered for the multi class leather defect classification application as they are relatively established and proved their ability for object detection and multi class classification applications. Figure 5 shows the architectures of deep learning convolutional neural networks which are considered in the present work. A typical convolutional neural network models contains convolution layer, pooling layer and fully connected layer. It can be seen that the complexity network architecture increases with concatenation, parallel channels and feedback as it shown in Fig. 5c, d for inception net and Resnet respectively. The convolution layers are associated with different parameters such as weights, kernel size, stride, padding etc. More details of deep learning architectures can be studied in the literature [24].

Convolution layer
Convolution layer provides automated feature extraction from given images with the specific spatial locations using number filters of different sizes. A non-overlapping feature map is obtained as an output using convolution operation between weights of the filter and the output of the previous convolutional layer as given by Eq. (1).
where (J, I) denotes the size of the filters, J is the height of the filters, and I is the width of the filters. b l denotes the bias of the convolutional layer. x l−1 denotes the output of the previous convolutional layer. wl denotes the weight of convolutional layer. f(⋅) is the nonlinear activation function and ReLU activation function is selected and is shown as Eq. (2). Size of the feature map depends on with several parameters including the input size, filter size, depth of the map stack, zero-padding and stride.
where (Mx, My), (Ix, Iy), (Kx, Ky) indicate the map size, input size, kernel size respectively and (Sx, Sy) indicate the stride in row & column. The number feature maps depend on with the number of filters and its size. In a typical deep learning convolution neural network, number of features with the increase in number of convolution layers and the associated filters. (1)

Convolution layer
Pooling layer Fully connected layer

Multi class defect Classification
Class activation map Input image

Activation function
In order to learn the universal approximation of input values and output classes as continuous function in a Euclidean space, a suitable activation function is essential. As the sigmoid and Hyperbolic tangent functions suffer vanishing gradient problem, a rectilinear unit is used as activation function and it returns the same value (x) of the feature maps provided as output (x) if its magnitude is greater than zero.

Pooling layer
In order to prevent the overfitting and reduce the dimensionality of feature maps, pooling layer perform down-sampling of the input feature map using a window function u (n,n). Max pooling and average pooling strategies are often followed in the pooling layer.
where u(n,n) is the window function, which is applied to calculate the maximum value of xl j in the neighborhood.

Fully connected layer
Fully connected layer in the deep learning neural network receives a feature vector from the previous max pooling layer and it is trained for the multi class classification of given leather image using the associated weights and an activation function by reducing a loss function. In this work, a fully connected layer with six number of output neurons is configured for providing the categorical output such as non-defective leather, Loose grain, Grain off, Growth marks, Pinhole and Folding marks using an encoding technique. More details of the training of the neural network for the multi class classification is described in Sect. 4.2. Typical output of a neuron in a fully connected layer for the feature vector of the max pooling layer x m−1 is given by Eq. (4).
where bm denotes the bias of the fully connected layer. w m denotes the weights of the fully connected layer. x m−1 denotes the output of the previous max-pooling layer. f(.) is the activation function.

Visualization of region of interest for defect localization
In this work, Gradient-weighted class activation mapping is followed which is the weighted sum of each channel of the feature map to identify the specific discriminative regions of the given leather image nondefective leather, Loose grain, Grain off, Growth marks, Pinhole and Folding marks. Here, class activation map with values of scores for the class of the leather 'c' at the spatial location (x,y) of the image is generated using the kth channel of the feature map and corresponding weight w ck as given by where M c is the class activation map of class c and w ck represents the k th weight of the fully connected layer of class c. As the part of the image with larger score influences the corresponding class, a thresholding approach is followed for the selecting the region of interests in the original image.
Subsequently, the region of interest ROI(x,y) is obtained with threshold T(x,y) for indicating the discriminative region in the image I(x,y): A bounding box is generated from the region of interest of the given image and it is compared with t ground truth bounding boxes in the ground truth leather images.

Machine learning based approaches for multi class defect classification of leather defects
Proposed deep learning neural network based multi class leather defect classification and localization are compared with the machine learning approaches like shallow feed-forward neural network (SFFNN), support vector machine (SVM), K-nearest neighbour (KNN). These machine learning approaches require manual feature extraction techniques from the leather images for the classification leather defects. Typical steps followed in machine learning based multi class defect classification of leather defects is shown in Fig. 6. The details of feature extraction and machine learning classifiers are given in the subsequent sections.

Hand crafted Feature extraction from leather images
In this work, color and texture features are extracted from 3600 leather images for each class such as nondefective leather, Loose grain, Grain off, Growth marks, Pinhole, Folding marks using color histogram, Autocorrelation and GLCM.

Color histogram
As the leather has different colours and it has colour variations in the defective locations of the leather image, it is analysed using histogram in RGB colour space for understanding the intensity variations in red, green and blue channels as given by Eq. (8) where i = 1,2, …, M/m and j = 1,2, …, N/n Further, a colour histogram is applied for the given image block and dominant intensity value of the red, green and blue channel (R max , G max , B max ) is extracted as the colour feature of the leather image as given below: Here C max refers to the magnitude of counts of red, blue, and blue channels of the leather image. These color features will help in quantifying the color changes of the leather image with the leather defects.

Autocorrelation
As the magnitude of the autocorrelation function is useful in describing the disturbance in the regular texture Measure of the intensity between a pixel and its neighbour Measure of orderliness of pixels Measure of smoothness of the gray level distribution Measure of distance between pairs of pixels Measure of the average intensity of all pixels Measure of dispersion of gray-level distribution of pixels pattern due to the presence of leather defects in the leather surface, the autocorrelation function is calculated for the leather image to measure its coarseness due to leather defects. Here, the color leather image f(x,y) is converted into grey scale image and a two-dimensional autocorrelation function of the given leather image f g (x,y) is calculated using the following Eq. (3).
where G (a,b) is the autocorrelation function for the grey scale and a and b represent the typical lag from the corresponding x and y position.

Grey level co-occurrence matrix
Grey level co-occurrence matrix provides important information for understanding the variation in texture pattern due to the type of leather defects on folding marks, grain off, pinhole, growth marks, and loose grain. Grey Level Co-occurrence Matrix (GLCM) for the given leather image is constructed by counting all pairs of a reference and neighbouring pixel separated by an offset (d) having the gray levels i and j at the specified relative orientation (θ) as given below: where n ij is the number of occurrences of reference and neighbouring pixels (i,j) lying at offset (d) in the image. p[i,j]is gray level co-occurrence matrix and it is calculated for the given grayscale image at four different orientations (θ = 0°, 45°, 90° and 135°) and offsets (d = − 3, − 2, − 1, 0, 1, 2, 3). The number of rows and columns of co-occurrence matrix p [i,j] is equal to the number of distinct gray levels (n).To reduce the computational burden of calculating GLCM for the given image, the gray level was set to 64.Further, the elements of P[i,j] are normalized by dividing each entry by the total number of pixel pairs. Table 2 lists the formulae for calculating the different texture features from the GLCM and the corresponding descriptions. In this work, statistical texture features such as contrast, correlation, dissimilarity, energy, entropy, homogeneity, mean, and variance are calculated as the texture features of the given leather image.
Using the extracted features using colour histogram, autocorrelation and GLCM, a labelled data set is developed for training the state of the machine learning classifiers such as shallow feed-forward neural network (SFFNN), support vector machine (SVM), K-nearest neighbour (KNN) for multi class classification of leather defects.

Shallow feed-forward neural network-based machine learning classifier
In this work, a shallow feed-forward neural network (SFNN) is trained to classify leather defects such as folding marks, grain off, pinhole, growth marks, and loose grain. Figure 7 shows the typical architecture of the proposed SFFNN with two hidden layers. Here, the color features and texture feature is used as the input vector (x i ). As the magnitude of the extracted color and texture features are different, unity-based normalization is followed to ensure the proper fusion of extracted features for reducing the bias and gross influences. It also ensures the values of the input vector into the range [0, 1]. A shallow feed-forward neural network can be considered as a nonlinear model with nonlinear basis functions φ j (x) as given by the Eq. (12).
Here the weights W j can be adjusted during training and φ j (x) is a nonlinear function of a linear combination of inputs. x refers to the extracted color and texture feature vector of the given leather image. The output of the feed-forward neural network (y) can be expressed as series of functional transformations as given by Eq. (13).
Here the superscripts (1) and (2) indicates the parameters of the respective hidden layers, x i indicates the input feature vector. j refers to the 'H' number of hidden nodes. K refers to the number of output neurons. g, h is the nonlinear activation function of the hidden layer and it used the sigmoid activation function.
In order to achieve the multi class classification of leather type such as folding marks, growth marks, grain off, loose grain, pin hole and non-defective leather, softmax function is applied as given below: This softmax function computes the probability of the given training sample x (i) belongs to class j given the weight and net input z (i) . Hence, we compute the probability p(y = j|x (i) ; w j ) for each class label in j = 1, …, k. Here, the normalization term in the denominator causes these class probabilities to sum up to one. Further, the shallow neural network is trained by adjusting the weights by defining and minimizing a cost function J. which is the average of all cross-entropies over the training data set as given below:

Here, the function (H) refers the cross-entropy function as defined below
Here the T corresponds to the "target" labels and the O stands for computed probability from the SoftMax function. The cross-entropy based cost function is (16)  minimized for the given training data set using the stochastic gradient descent method by iteratively updating the weight matrix until the specified number of epochs or desired cost threshold is reached.

Results and discussion
In this work, a deep learning computing system involving 64-bit Windows  the accuracy of classification of deep learning convolutional neural networks is compared with the existing machine learning approaches like shallow feed forward neural network, support vector machine and K-nearest neighbor. Classification performance metrics such as precision, sensitivity, f1-score, and accuracy are calculated using confusion matrix and the results are presented in this section.

Feature maps of convolution neural networks
In order to understand and interpret the feature maps in the convolution layers of the Alexnet, VGG-16, Google net, Squeeze Net, ResNet-5, it is extracted for few convolution layers. Figure 8 shows feature maps of a leather image in Alexnet for the 5 convolution layers. It can be seen in Fig. 8a that, simple features like edges are filtered by kernels in the first convolution layer and high-order features are extracted in the subsequent layers using the learned weights of the kernels as shown in Fig. 8b-e.
As layer depth increases the feature maps does not show much details due to the finer size of the filter in the same receptive field. Though it is difficult to interpret the extracted feature maps, it provides low level and highlevel features of leather texture variations in the leather image at the same receptive field which is used for classification of leather images.

Feature extraction using GLCM, autocorrelation
Colour histogram, GLCM, autocorrelation functions are applied extract the colors, texture features of the defective, non-defective leather images for evaluating the performance of the machine learning approaches such as for multi class leather defect classification. Figure 9 shows feature extraction results of typical defective leather with a fold mark and a non-defective leather image of yellow colour. As the presence of defects in the leather results in intensity variations, there is a variation in the number of counts of red, green, blue channel as shown in Fig. 9b. From Fig. 9c, it can be noted that hence the autocorrelation function of defective leather with folding marks decays slowly due to the coarser texture as compared to the non-defective leather with a finer texture as shown in Fig. 9c.
As there is a differing texture pattern in defective and non-defective leather, it results in variations in intensity variations and grey levels of neighbouring pixels which leads to change in magnitude of GLCM as shown in Fig. 9d. From these results, it is noted that the extracted color and texture features are varying due to the differing texture pattern of leather defects and the corresponding intensity variations.

Training and testing performance of deep learning neural networks
In this work, Stochastic Gradient Descent with momentum (SGD) is used to optimize the model hyper-parameters, particularly the initial rate, stride, filter size of deep learning neural networks. Table 3 shows the training details of the neural network. Figure 10 shows the training and testing performance curves for the different epochs and iterations of AlexNet. Bottom plot shows the cross-entropy loss function for different epochs of the training (blue) and testing (black) dataset, Top plot shows the trend and variation of the classification accuracy of AlexNet over epochs.
From the plots, it is noted that the training process converged well to reach the classification accuracy of 99.4%. It can be seen that the magnitude of loss reduces for each epoch with the enhance in accuracy of classification. Accuracy and loss function shows bumps as weights of the neural networks are learnt from the given examples of training and testing leather images for multi class classification. The elapsed Table 3 Parameters of the stochastic gradient descent with momentum (SGDM)

Hyper-parameters Value
Optimization algorithm SGDM Initial learning rate 0.0001 Epochs 220 Batch rate 30  computational time of the training process and the accuracy improvement, loss reduction is noted for different number of epochs and it is shown in Table 4.
It can be noted that mini batch loss reduces as the weight values are learnt for the correct classification with the increase in number of iterations and the classification accuracy reaches 99.40% at 220 Epoch.

Training performance of shallow feed forward neural network classifier
In this work, a shallow neural network architecture with 13 neurons in input layer, 24 neurons in the hidden layer are developed and trained in MATLAB environment Fig. 11 shows the architecture of the proposed neural network and the output layer is configured with 6 neurons for classification of type of leather image with folding marks, grain off, pinhole, growth marks, loose grain and non-defective leather surface. Training and development of proposed shallow feed-forward neural network are carried out, a cross-entropy function and gradient descent method are used for adjusting the weight values of the neural network. The training, testing and validation performance plot of the proposed SFFNN and the number of epochs is shown in Fig. 11b. It is found that the classification accuracy of 97.6% for the minimum cross-entropy value of 0.015625 and best validation performance is achieved at 345 epochs.

Classification performance of deep learning neural networks
In order to quantify the classification performance of the deep learning convolution neural networks for the multi class classification of leather texture, confusion matrix is calculated based on the number of output classes given by the classifier and the given target classes of leather images. Figure 12 shows the confusion matrix for deep learning neural networks and machine learning classifiers which provided top three highest classification accuracies during training and testing. In Fig. 12, high numbers in green cells represent correct responses and the low numbers in red cells correspond    the classification accuracy of 99.40% and 87.60% for the training and testing data sets than the other deep learning convolution neural network classifiers. Using the number of correct and incorrect classifications of the target class of the leather images, the precision, recall values are calculated and listed in Table 5 for different classifiers.
With the better testing accuracy of 87.60%, Alexnet performed better for multi class classification of leather textures of unseen leather images. These results proved the capability of deep learning neural network classifiers for the application of multi class leather texture classification.

Classification performance of machine learning approaches
State-of-the-art machine learning algorithms like shallow feed forward neural networks, support vector machine, K nearest neighbour, Decision tree, Naïve Bayes is applied using the hand-crafted colour and texture features for multi class leather texture classification. Confusion matrix is constructed for summarizing the performance of the different machine learning classifiers and it is shown in Fig. 13. As indicated by the higher magnitude of diagonal elements in the confusion matrix for the correct classification of each class, shallow feed forward neural network showed better performance in classification of different leather texture images. The overall classification accuracy of shallow feed forward neural network for multi class leather image classification is found to be 97.5% which is lesser than the deep learning convolution neural network models as the hand-crafted feature extraction limits the important discriminative features in the leather images.

Class activation maps for selection of region of interest in leather images
As the leather defects are localized in the specific regions of the leather image, class activation maps are generated using the trained deep learning neural network models for the given leather images and the sample results for different leather texture classes are shown in Fig. 14a, b for Alexnet and VGG-16 respectively. It can be seen that the regions with the red color is identified as the discriminative regions and it can be identified from the pixels with highest magnitude using the score map. In this work, maximum value of the class activation map is chosen and applied as the threshold for segmenting the region of interests in the leather images. Figure 15a sample leather image with ground truth bounding box in the defective areas and the corresponding class activation map in Fig. 15b. It can be seen that score map shows peaks and highest values in the respective regions of the image as identified in red color in Fig. 15b. A threshold value of 0.5983 is selected for segmenting the region of interest and the results are shown in Fig. 15c. It is found that the area of discriminative regions in the leather as identified by the class activation map is higher than the ground truth bounding box. Hence, it requires suitable algorithms for the precise detection and localization of leather defects in the leather images.

Conclusions
Leather texture plays is an important role in deciding the quality of the leather products. This work presented deep learning convolutional neural networks and machine learning classifiers for the multi class classification leather images. A comprehensive data set of 3600 leather images with different defects such as folding marks, grain off, pinhole, growth marks, loose grain, and non-defective leather surfaces are classified using pretrained deep learning neural networks such as Alexnet, VGG-16, Google net, Squeeze Net, ResNet-50. Performance of classification of deep learning convolutional neural networks is compared with the existing machine learning approaches like shallow feed forward neural network, support vector machine and K-nearest neighbour. From the results obtained from the confusion matrix, it is found that the deep learning convolution neural network like Alexnet performed better with the classification accuracy of 99.4% than the shallow feed forward neural network machine learning technique due to the superior feature extraction capability. Further, the use of class activation maps of the trained deep learning neural network for segmenting the regions of interest in the leather images is demonstrated for the localization of the defective regions. Proposed method can be suitably implemented for automated multi class classification of leather samples in an industrial environment.