A study of small sample node classification based on graph data augmentation
Keywords:
classification; Graph Convolutional Network (GCN); data augmentation; meta-learning; Few-shot Learing (FSL)Abstract
Graph data is widely found in the real world. However, it often faces a shortage of labeled data in practical applications. Many methods for few-shot learning on graphs aim to classify data with fewer labeled samples. Although good performance has been achieved in few-shot node classification tasks, there are still the following problems. High-quality labeled data are difficult to obtain; generalization ability is insufficient in the parameter initialization process of few-shot node classification (Few-Shot Node Classification, FSNC) methods; the topology structure information in the graph has not been fully mined for the existing FSNC methods. To address these problems, a new Few-Shot Node Classification model based Graph Data Augmentation (GDA-FSNC) was proposed. There are four main modules in GDA-FSNC: a graph data preprocessing module based on structural similarity, a parameter initialization module, a parameter fine-tuning module, and an adaptive pseudo-label generation module. In the graph data preprocessing module, an adjacency matrix enhancement method based on structural similarity was used to obtain more graph structural information. To enhance the diversity of information during the model training process, a mutual teaching data augmentation method was used in the parameter initialization module, by which different patterns and features were learned from the other model. In the adaptive pseudo-label generation module, appropriate pseudo-label generation techniques were automatically selected according to the characteristics of different datasets, then high-quality pseudo-label data was generated. Experiments were conducted on seven real datasets. The experimental results show that the proposed model performed better than state-of-the-art few-shot learning models such as Meta-GNN, GPN, and IA-FSNC in classification accuracy. On small datasets, it achieved an average improvement of 3.40 percentage points over the baseline IA-FSNC model, and on large datasets, the average improvement was 2.47 percentage points. GDA-FSNC shows better classification performance and generalization ability than state-of-the-art methods in few-shot learning scenarios.