Abstract: Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such ...