Pathview for KEGG
Cordeliers Artificial Intelligence and Bioinformatics
Source:vignettes/pathview.Rmd
pathview.Rmd
An additional analysis is given by the Pathview package
Pathview is an R tool used to visualize gene expression data on biological pathways (from KEGG), it helps for seeing how the genes of interest are involved in known cellular processes like the cell cycle or apoptosis.
To create the cell cycle graph, we use the function
pathview
. The expected input for this function is a vector
of the log2FoldChange of the DE genes.
If you want to use this notebook for your projects, it is available here
Pre-processing
In this page, you will only see the specific code to get the graph. The first steps are the same for every visualization. Before the next cells of code, you will need to do all the pre-processing analyses, until the diffexp. If you want to see the code, you can copy/paste it from the other pages
This is what you get after the diffexp
:
head(diffexp_mac)
## baseMean log2FoldChange lfcSE pvalue padj
## A1BG 185.361814 0.75051222 0.3099124 4.533451e-03 2.242446e-02
## A1BG-AS1 315.155738 -0.32048380 0.1677420 4.293224e-02 1.375238e-01
## A1CF 1.604499 0.04585155 0.4781308 8.266114e-01 NA
## A2M 71665.137761 -1.19356380 0.2310016 3.375187e-08 7.203143e-07
## A2M-AS1 39.045426 0.88104208 0.5735779 1.961794e-02 7.405728e-02
## A2ML1 2.457721 -0.04618380 0.4598400 8.400364e-01 9.379652e-01
Conversion
The next cell will take the log2FoldChange and ENTREZID to get the
expected object for the pathview
function.
#|message: false
library(org.Hs.eg.db)
library(dplyr)
# Retrieve the gene symbols from the row names of the differential expression data
gene_symbols <- rownames(diffexp_mac)
# Convert gene symbols to Entrez IDs using the org.Hs.eg.db annotation database
conversion <- AnnotationDbi::select(
org.Hs.eg.db, # Human gene annotation package
keys = gene_symbols, # List of gene symbols to convert
columns = c("ENTREZID"), # We want to retrieve Entrez IDs
keytype = "SYMBOL" # Input key type is gene symbol
)
# Add gene symbols as a new column in the expression data
diffexp_mac$SYMBOL <- rownames(diffexp_mac)
# Merge the differential expression data with the Entrez ID conversion table
diffexp_mac <- dplyr::left_join(diffexp_mac, conversion, by = "SYMBOL")
# Create a named vector of log2 fold changes using Entrez IDs as names
gene.data <- diffexp_mac$log2FoldChange
names(gene.data) <- diffexp_mac$ENTREZID
# Remove entries with missing Entrez ID names
gene.data <- gene.data[!is.na(names(gene.data))]
# Display the first few elements of the vector
head(gene.data)
## 1 503538 29974 2 144571 144568
## 0.75051222 -0.32048380 0.04585155 -1.19356380 0.88104208 -0.04618380
Graph
Once done, we can call the function with specific parameters that match our dataset.
#|message: false
pathview(
gene.data = gene.data, # Named vector of log2FoldChange values for DE genes
pathway.id = "hsa04110", # KEGG pathway ID
species = "hsa", # Species code for Homo sapiens
gene.idtype = "entrez", # The type of gene IDs used in gene.data
out.suffix = "macrophage_apoptosis", # Name for the output files
low = "lightgreen", # Color for low log2 fold change values
high = "pink", # Color for high log2 fold change values
na.color = "gray", # Color for genes not mapped or without data
kegg.native = TRUE
)
Using the
pathview package
, we visualized the impact of
differentially expressed genes
from the macrophage dataset
on the KEGG apoptosis pathway (hsa04110). The pathway
diagram highlights genes based on their log2FoldChange
values: upregulated genes are shown in pink (Cdc25a,
CycA, CycB, …), while downregulated genes appear in
light green (GADD45, Cip1, CycD, …). Genes with no expression data or
missing values are colored gray. This visualization allows us to assess
how the IFN-γ treatment influences components of the
apoptotic signaling cascade, helping to identify specific regulatory
points potentially activated or suppressed in response to the
stimulus.