BM regarding learning rates from 1 down to 0.1.
Open access peer-reviewed article
This Article is part of Artificial Intelligence Section
Article metrics overview
40 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: July 2024
Date of publication: August 2024
DoI: 10.5772/acrt.20240003
copyright: ©2024 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Generative artificial intelligence (GenAI) has been advancing with many notable achievements like ChatGPT and Bard. The deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to the strong role of deep neural networks (DNNs) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims at synaptic plasticity of the human neuron network, fosters the generation ability of DGM, which produces surprising results with the support of statistical flexibility. Two popular approaches in DGM are the variational autoencoder (VAE) and generative adversarial network (GAN). Both VAE and GAN have their own strong points although they share and imply the underlying theory of statistics as well as significant complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. This research unifies VAE and GAN into a consistent and consolidated model called the adversarial variational autoencoder (AVA) in which the VAE and GAN complement each other; for instance, the VAE is a good data generator by encoding data via the excellent ideology of Kullback–Leibler divergence and the GAN is a significantly important method to assess the reliability of data as to whether it is real or fake. In other words, the AVA aims to improve the accuracy of generative models; besides, the AVA extends the function of simple generative models. In methodology, this research focuses on the combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
deep generative model (DGM)
variational autoencoder (VAE)
generative adversarial network (GAN)
Author information
The variational autoencoder (VAE) and the generative adversarial network (GAN) are two popular approaches for developing a deep generative model (DGM) [1] with the support of a deep neural network (DNN). The high capacity of DNN contributes significantly to the success of GAN and VAE. Some works have combined the VAE and the GAN. Larsen
Mescheder
Ahmad
In general, both VAE and GAN have their own strong points. For instance, they not only take advantage of solid statistical theory as well as DNN but they also suffer from drawbacks. For example, VAE does not have a mechanism to distinguish fake data from real data and GAN does not handle explicitly probabilistic distribution of encoded data. It is better to utilize their strong points and alleviate their weak points. Therefore, this research focuses on incorporating GAN into VAE by skillful techniques related to both SGD and software engineering architecture, which are neither based on purely mathematical fusion nor on experimental tasks. In practice, many complex mathematical problems can be solved effectively by some skillful techniques of computer programming. Moreover, the proposed model called adversarial variational autoencoder (AVA) aims to extend functions of VAE and GAN as a general architecture for the generative model. For instance, AVA will provide an encoding function that GAN does not possess and a discrimination function that VAE needs to distinguish fake data from real data. The combination of VAE and GAN into AVA is strengthened by a regular and balance mechanism, which obviously is natural and like the fusion mechanism. In some cases, it is better than the fusion mechanism because both built-in VAE and GAN inside AVA can retain their own strong features. Therefore, the experiment in this work is not very significant regarding large data when only AVA, VAE, and GAN are compared within a small dataset, which aims to prove the proposed method mentioned in the next section.
This research proposes a method as well as a generative model that incorporates GAN into VAE for extending and improving the DGM because GAN does not deal with the coding of original data and VAE lacks mechanisms to assess the quality of generated data. Note that data coding is necessary for some essential applications such as image compression and recognition whereas auditing quality can improve the accuracy of generated data. As convention, let vector variables
The GAN developed by Goodfellow
Causality–effect relationship between decoder DNN and discriminator DNN.
When weights are assumed to be 1, the error of the causal decoder neuron is the error of the discriminator neuron multiplied by the derivative at the decoder neuron. Moreover, the error of the discriminator neuron, in turn, is the product of its minus bias −
AVA architecture.
The AVA architecture follows an important aspect of VAE where the encoder
The balance function
Reverse causality–effect relationship between discriminator DNN and decoder DNN.
Suppose the bias of each decoder output neuron is bias[
The encoder parameter 𝛩 consists of two separate parts 𝛩𝜇 and 𝛩𝛴 because the output of encoder
AVA architecture with support of assessing encoder.
Similarly, the balance function
In this experiment, AVA is tested with VAE and GAN; but there are five versions of AVA such as AVA1, AVA2, AVA3, AVA4, and AVA5. Recall that AVA1 is the normal version of AVA whose parameters are listed as follows:
Images for DGM training and testing.
It is necessary to define how efficient DGMs such as VAE, GAN, and AVA are. Let imageGen be the best image generated by a DGM, which is compared with the
The four AVA variants (AVAs) as well as VAE and GAN are evaluated by BM with 19 learning rates (𝛾 = 1, 0.9, … , 0.1, 0.09, … , 0.01) because the SGD algorithm is affected by the learning rate and the accuracy of AVA varies slightly within a learning rate because of randomizing encoded data
AVA1 | AVA2 | AVA3 | AVA4 | AVA5 | VAE | GAN | |
---|---|---|---|---|---|---|---|
𝛾 = 1.0 | 0.2298 | 0.2301 | 0.0642 | 0.0766 | 0.2301 | 0.0583 | 0.2298 |
𝛾 = 0.9 | 0.2307 | 0.2294 | 0.0546 | 0.0594 | 0.2293 | 0.0681 | 0.2283 |
𝛾 = 0.8 | 0.2309 | 0.2316 | 0.0596 | 0.0546 | 0.2301 | 0.0587 | 0.2311 |
𝛾 = 0.7 | 0.2316 | 0.2305 | 0.0629 | 0.0631 | 0.2305 | 0.0665 | 0.2311 |
𝛾 = 0.6 | 0.2309 | 0.2317 | 0.0555 | 0.0657 | 0.2318 | 0.0623 | 0.2315 |
𝛾 = 0.5 | 0.2318 | 0.2319 | 0.0591 | 0.0598 | 0.2313 | 0.0610 | 0.2311 |
𝛾 = 0.4 | 0.2322 | 0.2329 | 0.0629 | 0.0732 | 0.2322 | 0.0568 | 0.2312 |
𝛾 = 0.3 | 0.2318 | 0.2321 | 0.0741 | 0.0655 | 0.2326 | 0.0651 | 0.2325 |
𝛾 = 0.2 | 0.2300 | 0.2312 | 0.0740 | 0.0929 | 0.2302 | 0.0735 | 0.2315 |
𝛾 = 0.1 | 0.2103 | 0.2105 | 0.1230 | 0.1217 | 0.2114 | 0.1238 | 0.2107 |
BM regarding learning rates from 1 down to 0.1.
Table 2 shows the BM values of AVAs, VAE, and GAN with nine learning rates: 𝛾 = 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01.
AVA1 | AVA2 | AVA3 | AVA4 | AVA5 | VAE | GAN | |
---|---|---|---|---|---|---|---|
𝛾 = 0.09 | 0.2038 | 0.2015 | 0.1319 | 0.1328 | 0.2026 | 0.1338 | 0.2031 |
𝛾 = 0.08 | 0.1924 | 0.1938 | 0.1417 | 0.1446 | 0.1978 | 0.1435 | 0.1916 |
𝛾 = 0.07 | 0.1842 | 0.1826 | 0.1566 | 0.1574 | 0.1834 | 0.1555 | 0.1818 |
𝛾 = 0.06 | 0.1685 | 0.1772 | 0.1662 | 0.1659 | 0.1785 | 0.1676 | 0.1699 |
𝛾 = 0.05 | 0.1664 | 0.1617 | 0.1792 | 0.1785 | 0.1621 | 0.1805 | 0.1628 |
𝛾 = 0.04 | 0.1675 | 0.1655 | 0.1918 | 0.1906 | 0.1662 | 0.1924 | 0.1665 |
𝛾 = 0.03 | 0.1845 | 0.1832 | 0.2017 | 0.2014 | 0.1855 | 0.2021 | 0.1857 |
𝛾 = 0.02 | 0.2047 | 0.2032 | 0.2098 | 0.2098 | 0.2028 | 0.2099 | 0.2046 |
𝛾 = 0.01 | 0.2147 | 0.2146 | 0.2147 | 0.2147 | 0.2146 | 0.2147 | 0.2148 |
BM regarding learning rates from 0.09 down to 0.01.
Table 3 shows BM means, BM maxima, BM minima, and BM standard deviations (SDs) of AVAs, VAE, and GAN.
AVA1 | AVA2 | AVA3 | AVA4 | AVA5 | VAE | GAN | |
---|---|---|---|---|---|---|---|
Mean | 0.2093 | 0.2092 | 0.1202 | 0.1225 | 0.2096 | 0.1207 | 0.2089 |
Maximum | 0.2322 | 0.2329 | 0.2147 | 0.2147 | 0.2326 | 0.2147 | 0.2325 |
Minimum | 0.1664 | 0.1617 | 0.0546 | 0.0546 | 0.1621 | 0.0568 | 0.1628 |
SD | 0.0249 | 0.0251 | 0.0606 | 0.0586 | 0.0244 | 0.0606 | 0.0252 |
Evaluation of AVAs, VAE, and GAN.
Note that VAE and GAN represent a pole of similarity quality and a pole of balance quality, respectively. From the experimental results shown in Table 3, AVA5 is the best DGM because it gains the highest BM mean (0.2096), which is also larger than the BM mean (0.2089) of the pole GAN. It is easy to explain this result because AVA5 is the one that improves both the decoding task and the encoding task when it embeds both the decoder discriminator and the encoder discriminator as well as both the leaning decoder and the leaning encoder. Moreover, both AVA1 and AVA2 are better than GAN because their BM means (0.2093, 0.2092) are larger than the BM mean (0.2089) of GAN. If the similarity quality is considered, AVA3 is the best DGM because it gains the lowest BM mean (0.1202), which is also smaller than the BM mean (0.1207) of the pole VAE. It is easy to explain this result because AVA3 is the one that improves the encoding task when it embeds the encoder discriminator. Moreover, AVA1, which is a fair AVA because it embeds the decoder discriminator but does not support the leaning decoder, is better than the pole GAN whereas AVA3, which is a fair AVA because it embeds the encoder discriminator but does not support the leaning encoder, is better than the pole VAE. This result is important because the best AVA5 is not a fair one because it supports both the leaning decoder and the leaning encoder. Therefore, about the BM mean, which is the most important metric, all AVA variants are better than traditional DGMs such as VAE and GAN with regard to both similarity quality and balance quality.
Although the BM mean is the most important metric, it is necessary to check other metrics related to extreme values that are BM maximum and BM minimum, where BM maximum implies the best balance quality and BM minimum implies the best similarity quality. Note from experimental results shown in Table 3 that the decoder improvement with AVA1 and AVA2 aims to improve balance quality with high BM and the encoder improvement with AVA3 and AVA4 aims to improve similarity quality with low BM whereas AVA5 improves both the decoder and the encoder. AVA2 and AVA5 are better DGMs about the extreme balance quality because their BM maxima (0.2329, 0.2326) are larger than the BM maximum (0.2325) of GAN. Similarly, AVA3 and AVA4 are better DGMs about the extreme similarity quality because their BM minima (0.0546, 0.0546) are smaller than the BM minimum (0.0568) of VAE. Therefore, about BM extreme values, AVA variants are better than traditional DGMs such as VAE and GAN with regard to both similarity quality and balance quality.
Because the two poles VAE and GAN are stabler than AVAs in theory as each AVA includes functions from VAE and GAN so that each AVA is more complicated than VAE and GAN, it is necessary to check the SD of BM, which reflects the stability of DGMs. The smaller the SD, the stabler the DGM. AVA1 and AVA2 are stabler than GAN when their SDs (0.0249, 0.0251) are smaller than the SD (0.0252) of GAN. AVA3 and AVA4 are slightly stabler than VAE when their SDs (0.0606, 0.0586) are smaller than or equal to the SD (0.0606) of VAE. Moreover, AVA5 is the best one about the stability quality when its SD (0.0244) is the smallest. Therefore, AVA variants are stabler than traditional DGMs such as VAE and GAN.
Figure 6 depicts BM means, BM maxima, BM minima, and BM standard deviations of AVAs, VAE, and GAN by charts.
Evaluation of AVAs, VAE, and GAN.
It is concluded that the combination of GAN and VAE, which produces AVA in this research, results in better encoding and decoding performance of the DGM when metrics such as BM means, BM maxima, BM minima, and BM standard deviations of AVAs are better with regard to contexts of balance quality and similarity quality. Moreover, AVA5, which is full of functions including the decoder discriminator, decoder leaning, encoder discrimination, and encoder leaning, produces the best results with the highest balance quality given the largest BM mean (0.2096) and the highest stability given the smallest SD (0.0244).
It is certain that AVA is better than the traditional VAE and GAN due to the support of Kullback–Leibler divergence that establishes the encoder as well as the built-in discriminator function of GAN that assesses the reliability of data. It is possible to think that VAE and GAN are solid models in both theory and practice when their mathematical foundation cannot be changed or transformed. However, it is still possible to improve them by modifications or combinations as well as applying them to specific tools where their strong points are brought into play. In applications related to raster data like images, VAE has a drawback of consuming much memory because probabilistic distribution represents the entire image whereas some other DGMs focus on representing the product of many conditional probabilistic distributions for pixels. Although this approach for modeling pixels by the recurrent neural network may consume less memory, it is significantly useful to fill in or recover smaller damaged areas in a bigger image. In the future, we will try to apply the pixel approach to AVA; for instance, AVA processes a big image block by block and then every block is modeled by a conditional probability distribution with a recurrent neural network as well as a long short-term memory network.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: July 2024
Date of publication: August 2024
DOI: 10.5772/acrt.20240003
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2024. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
40
Downloads
84
Views
Join us today!
Submit your Article