JPEG is a widely used lossy compression standard method specifically designed for images. It was developed by the Joint Photographic Experts Group. JPEG compression can be roughly divided into five steps: 1) Color space transformation, 2) Downsampling, 3) Discrete Cosine Transform (DCT), 4) Quantization, and 5) Entropy coding. Here, I will mainly analyze the third step, "Discrete Cosine Transform," and the fourth step, "Quantization," and provide the corresponding Python code implementation.
Among them, F F F represents the original image, T T T represents the image after DCT. Assuming the matrix size is N × N N \times N N × N , the DCT matrix C C C is as follows:
cos 2 N ( 2 v + 1 ) u π u = 0 , 0 ≤ v ≤ N − 1 1 ≤ u ≤ N − 1 , 0 ≤ v ≤ N − 1
If the horizontal axis is u u u , the vertical axis is v v v , the matrix representation of C C C is:
C ( u , v ) = [ 1 N 1 N ⋯ 1 N 2 N c o s π 2 N 2 N c o s 3 π 2 N ⋯ 2 N c o s ( 2 N − 1 ) π 2 N ⋮ ⋮ ⋱ ⋮ 2 N c o s ( N − 1 ) π 2 N 2 N c o s 3 ( N − 1 ) π 2 N ⋯ 2 N c o s ( 2 N − 1 ) ( N − 1 ) π 2 N ] C(u,v)=\begin
cos 2 N π ⋮ N 2
cos 2 N ( N − 1 ) π N 1
cos 2 N 3 π ⋮ N 2
cos 2 N 3 ( N − 1 ) π ⋯ ⋯ ⋱ ⋯ N 1
cos 2 N ( 2 N − 1 ) π ⋮ N 2
cos 2 N ( 2 N − 1 ) ( N − 1 ) π ⎦
C T ( u , v ) = [ 1 N 2 N c o s π 2 N ⋯ 2 N c o s ( N − 1 ) π 2 N 1 N 2 N c o s 3 π 2 N ⋯ 2 N c o s 3 ( N − 1 ) π 2 N ⋮ ⋮ ⋱ ⋮ 1 N 2 N c o s ( 2 N − 1 ) π 2 N ⋯ 2 N c o s ( 2 N − 1 ) ( N − 1 ) π 2 N ] C^T(u,v)=\begin
cos 2 N 3 π ⋮ N 2
cos 2 N ( 2 N − 1 ) π ⋯ ⋯ ⋱ ⋯ N 2
cos 2 N ( N − 1 ) π N 2
cos 2 N 3 ( N − 1 ) π ⋮ N 2
cos 2 N ( 2 N − 1 ) ( N − 1 ) π ⎦
Basis functions above can be illustrated as images, known as the basis image. The specific method is as follows: For an NxN DCT matrix, by multiplying the nth column vector of C C C with the mth row vector of C T C^T C T , a total of N 2 N^2 N 2 matrices can be obtained. These matrices are then concatenated according to their positions (m,n), resulting in a basis image of size N 2 × N 2 N^2 \times N^2 N 2 × N 2 . Any image can be composed using these cosine transform combinations. In fact, DCT is to calculate the contribution of each cosine waves and get what we call "coefficient".
import numpy as np import matplotlib.pyplot as plt import math import cv2 import skimage.io as io # DCT matrix N=8 # matrix shape: 8*8 DCT=np.zeros((N,N)) for m in range(N): for n in range(N): if m==0: DCT[m][n]=math.sqrt(1/N) else: DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N)) # DCT basis image basis=np.zeros((N*N,N*N)) for m in range(N): for n in range(N): pos_m=m*N pos_n=n*N DCT_v=DCT[m,:].reshape(-1,1) DCT_T_h=DCT.T[:,n].reshape(-1,N) basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h) # Center values basis+=np.absolute(np.amin(basis)) scale=np.around(1/np.amax(basis),decimals=3) for m in range(basis.shape[0]): for n in range(basis.shape[1]): basis[m][n]=np.around(basis[m][n]*scale,decimals=3) # Show basis image plt.figure(figsize=(4,4)) plt.gray() plt.axis('off') plt.title('DCT Basis Image') plt.imshow(basis,vmin=0)
Enter fullscreen mode
Exit fullscreen mode
This is the basis image of an 8x8 DCT matrix, with a size of 64x64. By observing the image, we can notice that the horizontal frequencies increase from left to right, while the vertical frequencies increase from top to bottom. The constant basis function in the top left corner is commonly referred to as the DC (Direct Current) basis function, and the corresponding DCT coefficient is known as the DC coefficient. Any image can be composed by overlaying 64 cosine transformations from this reference image. Since we have set the size of the DCT matrix to be 8x8, we need to divide the image into several small 8x8 squares, referred to as blocks. The image is processed in units of blocks. Below, we will demonstrate 8 sets of DCT matrices along with their corresponding blocks.
# DCT matrix blocks=np.zeros((8*8,8)) for i in range(8): blocks[i*8][i]=1 # IDCT -> Original images blocks_idct=np.zeros((8*8,8)) for i in range(8): block=blocks[i*8:i*8+8][:] data=cv2.idct(block) blocks_idct[i*8:i*8+8][:]=data # Show DCT matrix plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block=blocks[i*8:i*8+8][:] plt.gray() plt.axis('off') plt.imshow(block,vmin=0)
Enter fullscreen mode
Exit fullscreen mode
# Show original images plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block_idct=blocks_idct[i*8:i*8+8][:] plt.gray() plt.xticks([]) plt.yticks([]) plt.imshow(block_idct,vmin=0)
Enter fullscreen mode
Exit fullscreen mode
Let's assume that the original images above are numbered from left to right as 1-8. We can see that Image 1 is exactly the same as the top-left small square in the DCT basis image, which means its DCT matrix only has weights in the top-left corner, and the weights are 0 everywhere else. Image 2 is identical to the small square in the first row and second column of the DCT basis image, so its DCT matrix shows weights only at (0,1), and the weights are 0 elsewhere. This pattern continues for the other images. This is the essence of the DCT.
DCT itself is lossless, and its purpose is to transform the image from the spatial domain to the frequency domain. To achieve compression, we need to introduce a step between DCT and IDCT. The human eye is less sensitive to high-frequency information, so the compression method involves discarding high-frequency information from the image. In the DCT matrix, the information in the top-left corner is high-frequency, while the information in the bottom-right corner is low-frequency. Below, I will demonstrate the basic idea of JPEG compression by selectively retaining 1, 3, and 10 low-frequency components.
# Read image img=io.imread("img3.jpg",as_gray=True) # Obtaining a mask through zigzag scanning def z_scan_mask(C,N): mask=np.zeros((N,N)) start=0 mask_m=start mask_n=start for i in range(C): if i==0: mask[mask_m,mask_n]=1 else: # If even, move upward to the right if (mask_m+mask_n)%2==0: mask_m-=1 mask_n+=1 # If it exceeds the upper boundary, move downward if mask_m=N: mask_n-=1 # If odd, move downward to the left else: mask_m+=1 mask_n-=1 # If it exceeds the lower boundary, move upward if mask_m>=N: mask_m-=1 # If it exceeds the left boundary, move right if mask_n
Enter fullscreen mode
Exit fullscreen mode
For the images above, the more coefficients you retain, the better the image quality. In the DCT matrix, the top-left corner's DC component contains most of the information. If you only keep the DC component from the top-left corner, the entire block will be represented by the DC component alone, resulting in the entire block having a single value. By retaining more coefficient information, more cosine components are overlaid, resulting in more details in the image.
The concept of quantization is similar to the second step. The quantization matrix is user-defined, where smaller quantization coefficients retain more information from the image, while larger quantization coefficients retain less information. For example, consider a 2x2 DCT matrix:
If the quantization matrix is:
If we divide C C C by Q Q Q , round the resulting values, and then multiply them back by Q Q Q , we can obtain the quantized matrix C q C_q C q .
Indeed, we can observe that the value 5 in C C C becomes 0 after quantization. If we reduce Q Q Q , for example, to 5, the value 5 will not be discarded. Conversely, if we increase Q Q Q , it is possible that higher values, such as 13, may also be discarded. In software applications like Photoshop and other image processing tools, adjusting the "quality" setting corresponds to modifying the quantization coefficients. The quantization matrix can have different values at each position.
# Define quantization matrix q_mat=np.array([[16,11,10,16,24,40,51,61], [12,12,14,19,26,58,60,55], [14,13,16,24,40,57,69,56], [14,17,22,29,51,87,80,62], [18,22,37,56,68,109,103,77], [24,35,55,64,81,104,113,92], [49,64,78,87,103,121,120,101], [72,92,95,98,112,100,103,99]]) # Quantization def Compress_2(img,N): img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N)) for m in range(0,img_dct.shape[0],N): for n in range(0,img_dct.shape[1],N): block=img[m:m+N,n:n+N] # DCT coeff=cv2.dct(block) # Quantization q_coeff=np.around(coeff*255/q_mat) # Inverse quantization iq_coeff=q_coeff*q_mat # IDCT iblock=cv2.idct(iq_coeff) img_dct[m:m+N,n:n+N]=iblock return img_dct # Show Images plt.figure(figsize=(8,4)) plt.gray() plt.subplot(121) plt.title('Original') plt.imshow(img) plt.axis('off') plt.subplot(122) plt.title('Quantified') plt.imshow(Compress_2(img,8)) plt.axis('off') plt.show