Understanding and characterizing cancer patient outcomes is challenging and involves multiple clinical measurements (e.g., imaging and genomics biomarkers). Enabling multimodal analytics promises to reveal novel predictive patterns that are not available from singular data input. In particular, exploring histopathological and genomics sequencing data allows us to provide a path for us to understand the insights of cancer biology. In this dissertation, we first present a graph-based neural network (GNN) framework that allows multi-region spatial connection of tiles to predict molecular profile status in colorectal cancer. We demonstrate the validity of spatial connections of tumor tiles built upon the geometric coordinates derived from the raw histopathological images. These findings capture the interaction between histopathological characteristics and a panel of molecular profiles of treatment relevance. Second, we propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colorectal cancer survival prediction. The proposed unsupervised pretraining captures the intrinsic interaction between tissue microenvironments in WSI and a wide range of genomics data (e.g., miRNA-sequence, copy number variant, and methylation). After the multimodal knowledge aggregation in pretraining, the task-specific model finetuning expands the scope of data utility applicable to both multi- and single-modal data. Finally, we introduce a contrastive pathology-and-genomics pretraining to enhance patient survival prediction by extracting the multimodal interaction for each patient while distinguishing the differences among various patients. Together, the above methods provide an array of solutions for addressing the challenges in multimodal disease data understanding, leading to improved overall performance of patient outcome prediction in colorectal cancer.