图片在线影院在线影院
弁言Hello小伙伴们群众好,我是生信妙技树的小学徒”我才不吃蛋黄“。今天是胃癌单细胞数据集GSE163558复现系列第六期。第五期咱们通过绘画饼图、堆积柱状图、气泡图等,相比不同分组之间细胞比例各异。本期,咱们将下载胃癌bulk数据(TCGA-STAD),筛选胃癌相干各异抒发基因,笔据各异基因list,使用AddModuleScore_UCell函数给上皮细胞亚群Seurat对象添加恶性评分,以此协助永诀普通上皮和恶性上皮。
1.配景先容尽人皆知,胃癌细胞是胃上皮细胞发生了恶性病变。在第三期对细胞分群注目时,咱们并未对上皮细胞进行细分,这内部就包括非恶性上皮和恶性上皮细胞。那么该若何识别恶性上皮细胞?本文作家笔据TCGA数据库中胃癌和普通胃组织之间的各异抒发基因,界说了每个上皮细胞的恶性和非恶性评分,从而永诀恶性上皮和非恶性上皮细胞。虽然,咱们还不错诈欺copykat和infercnv从单细胞转录组数据估量肿瘤拷贝数变异,从而永诀上皮恶性与否(后期更新,敬请期待)。
2.数据下载在素雅分析之前,咱们先从TCGA数据库赢得胃癌患者的抒发数据和临床数据。 领先加载责任目次,创建新的文献夹,加载R包:
rm(list=ls())getwd()setwd("/home/data/t020505/GSE163558-GC代码公众号复现版")dir.create("6-TCGA_STAD")setwd('6-TCGA_STAD/')source('../scRNA_scripts/mycolors.R')library(tidyverse)library(data.table) #要是需要下载TCGA其他癌种,请更换projproj = "TCGA-STAD"dir.create("input")
在R中诈欺代码一键下载TCGA数据库胃癌患者的抒发数据及临床数据,并将下载好了数据存放在input文献夹:
if(T){ download.file(url = paste0("https://gdc.xenahubs.net/download/",proj, ".htseq_counts.tsv.gz"),destfile = paste0("input/",proj,".htseq_counts.tsv.gz")) ##抒发数据 download.file(url = paste0("https://gdc.xenahubs.net/download/",proj, ".GDC_phenotype.tsv.gz"),destfile = paste0("input/",proj,".GDC_phenotype.tsv.gz")) ##临床数据 download.file(url = paste0("https://gdc.xenahubs.net/download/",proj, ".survival.tsv"),destfile = paste0("input/",proj,".survival.tsv")) ##生计数据}clinical = read.delim(paste0("input/",proj,".GDC_phenotype.tsv.gz"),fill = T,header = T,sep = "\t")surv = read.delim(paste0("input/",proj,".survival.tsv"),header = T) head(surv) #生计数据os和os.time
生计数据包括os和os.time:
图片
3.数据整理鄙人载完数据之后,咱们需要对数据进行初步的整理。
3.1 措置抒发矩阵领先措置抒发矩阵:
dat = read.table(paste0("input/",proj,".htseq_counts.tsv.gz"),check.names = F,row.names = 1,header = T)dat <- data.table::fread(paste0("input/",proj,".htseq_counts.tsv.gz"), data.table = F) #一起是symbola = dat[60483:60488,]dat = dat[1:60483,]rownames(dat) = dat$Ensembl_IDa = data = a[,-1]exp = a
去除在所有样本里抒发量皆为零的基因:
exp = exp[rowSums(exp)>0,]nrow(exp)
仅保留在一半以上样本里抒发的基因:
exp = exp[apply(exp, 1, function(x) sum(x > 0) > 0.5*ncol(exp)), ]nrow(exp)
抒发矩阵行名ID调度:
library(stringr)head(rownames(exp))library(AnnoProbe)rownames(exp) = substr(rownames(exp), 1, 15)##rownames(exp) = str_split(rownames(exp),"\\.",simplify = T)[,1];head(rownames(exp))#annoGene(rownames(exp),ID_type = "ENSEMBL") re = annoGene(rownames(exp),ID_type = "ENSEMBL");head(re)#if(!require(tinyarray))devtools::install_github("xjsun1221/tinyarray",upgrade = FALSE,dependencies = TRUE)library(tinyarray)exp = trans_array(exp,ids = re,from = "ENSEMBL",to = "SYMBOL")exp[1:4,1:4]3.2 整理分组信息
然后整理分组信息:
笔据样本ID的第14-15位,给样天职组(tumor和normal):
01A:癌症组织01B:福尔马林浸泡样本02A:复发组织06A:转机组织11A:癌旁组织table(str_sub(colnames(exp),14,15)) table(str_sub(colnames(exp),14,16)) Group = ifelse(as.numeric(str_sub(colnames(exp),14,15)) < 10,'tumor','normal') Group = factor(Group,levels = c("normal","tumor"))table(Group)group = make_tcga_group(exp)table(group)
保存数据:
save(exp,Group,proj,clinical,surv,file = paste0(proj,".Rdata"))4.数据分析4.1 TCGA各异分析
领先拆除系统环境变量,加载TCGA数据,加载各异分析R包“edgeR”:
rm(list = ls())load("TCGA-STAD.Rdata")table(Group)library(edgeR)
运转进行胃癌和癌旁组织各异基因分析:
dge <- DGEList(counts=exp,group=Group)dge$samples$lib.size <- colSums(dge$counts)dge <- calcNormFactors(dge) design <- model.matrix(~Group)dge <- estimateGLMCommonDisp(dge, design)dge <- estimateGLMTrendedDisp(dge, design)dge <- estimateGLMTagwiseDisp(dge, design)fit <- glmFit(dge, design)fit <- glmLRT(fit) DEG2=topTags(fit, n=Inf)class(DEG2)DEG2=as.data.frame(DEG2)head(DEG2)
图片
添加change列记号基因上调下调,在这里,咱们对各异分析筛选的阈值为:“| log2 fold change |>1, pvalue <0.05”,最终在胃癌中筛选到了2213个上调基因,142个下调基因。
logFC_t = 1pvalue_t = 0.05k1 = (DEG2$PValue < pvalue_t)&(DEG2$logFC < -logFC_t)k2 = (DEG2$PValue < pvalue_t)&(DEG2$logFC > logFC_t)DEG2$change = ifelse(k1,"DOWN",ifelse(k2,"UP","NOT"))head(DEG2)table(DEG2$change)
图片
图片
保存各异分析成果:
save(DEG2,Group,file = paste0(proj,"_DEG.Rdata"))4.2.各异基因可视化
各异基因可视化聘请了热图和火山图:
library(ggplot2)library(tinyarray)dat = log2(cpm(exp)+1)pca.plot = draw_pca(dat,Group);pca.plotsave(pca.plot,file = paste0(proj,"_pcaplot.Rdata"))cg2 = rownames(DEG2)[DEG2$change !="NOT"]h2 = draw_heatmap(dat[cg2,],Group)v2 = draw_volcano(DEG2,pkg = 2,logFC_cutoff = logFC_t)library(patchwork)h2 + v2 +plot_layout(guides = 'collect') &theme(legend.position = "none")ggsave(paste0(proj,"_heat_vo.png"),width = 16,height = 9)
图片
4.3 索求上皮细胞亚群缔造立时种子,读取第三期细胞分群注目后的单细胞文献:
set.seed(12345)sce.all=readRDS( "../3-Celltype/sce_celltype.rds")table(sce.all$celltype)
索求出其中的上皮细胞:
sce1 = sce.all[, sce.all$celltype %in% c( 'Epithelial' )]
然后对上皮细胞亚群走Seurat V5圭臬进程(同第二期):
names(sce1@assays$RNA@layers)sce1[["RNA"]]$counts LayerData(sce1, assay = "RNA", layer = "counts")dim(sce1[["RNA"]]$counts )as.data.frame(sce1@assays$RNA$counts[1:10, 1:2])head(sce1@meta.data, 10)table(sce1$orig.ident) sce = sce1sce <- NormalizeData(sce, normalization.method = "LogNormalize", scale.factor = 1e4)GetAssay(sce,assay = "RNA")sce <- FindVariableFeatures(sce, selection.method = "vst", nfeatures = 2000) sce <- ScaleData(sce) sce <- RunPCA(object = sce, pc.genes = VariableFeatures(sce)) DimHeatmap(sce, dims = 1:12, cells = 100, balanced = TRUE)ElbowPlot(sce)
图片
DimHeatmap图片
ElbowPlot分辨率缔造为0.1:
sce <- FindNeighbors(sce, dims = 1:15)sce <- FindClusters(sce, resolution = 0.1)table(sce@meta.data$RNA_snn_res.0.1)
数据降维:
set.seed(321)sce <- RunTSNE(object = sce, dims = 1:15, do.fast = TRUE)sce <- RunUMAP(object = sce, dims = 1:5, do.fast = TRUE)
分群可视化:
p = DimPlot(sce,reduction = "tsne",label=T,cols = mycolors)
图片
然后保存上皮细胞rds文献:
saveRDS(sce,file = "Epithelial.rds")4.4 AddModuleScore_UCell函数恶性评分
AddModuleScore_UCell函数不错达成基因齐集打分,通过评估单细胞数据齐集特定基因特征的抒发形状,咱们不错量化生物学信号的强度,并震动细胞的亚型和景色。在胃癌组织中取上调排行前50的各异基因用犯法性评分,下调排行前50的各异基因用作普通评分。
#sce = readRDS('Epithelial.rds')head(DEG2)table(DEG2$change)UP = DEG2[DEG2$change == 'UP' & DEG2$FDR <0.01,]DOWN = DEG2[DEG2$change == 'DOWN' & DEG2$FDR <0.01,]##用犯法性评分sorted_up = UP[order(UP$FDR),]top_50_upgene = rownames(sorted_up[1:50, ]) ##用作普通评分sorted_down = DOWN[order(DOWN$FDR),]top_50_downgene = rownames(sorted_down[1:50, ])library(UCell)marker <- list(top_50_upgene,top_50_downgene)#将基因整成listnames(marker)[1] <- 'tumor'names(marker)[2] <- 'normal'score <- AddModuleScore_UCell(sce,features=marker)
预备每个上皮细胞的恶性评分减去非恶性评分,并将评分从小到大排序以拟合滋长弧线。然后,笔据滋长弧线拐点隔邻的最大流弊,将细胞分为得分较高的恶性细胞(≥0.02)或得分较低的非恶性细胞(≤0.02)。
26xesce2 = scoredf = sce2@meta.datadf$score = df$tumor_UCell - df$normal_UCelltable(df$score)dim(sce)df$score1 = 'unknown'df[df$score > -0.02,]$score1 = 'malignant'table(df$score1)df[df$score < -0.02,]$score1 = 'nonmalignant'table(df$score1)sce2@meta.data = dfDimPlot(sce2, reduction = "tsne",cols = mycolors,pt.size = 0.8, group.by = "score1",label = T,label.box = T)
图片
然后绘画气泡图和堆积性柱状图可视化分组细胞比例(同第五期 胃癌单细胞数据集GSE163558复现(五):细胞比例):
tb=table(sce2$tissue, sce2$score1)head(tb)library (gplots) balloonplot(tb)
图片
bar_data <- as.data.frame(tb)bar_per <- bar_data %>% group_by(Var1) %>% mutate(sum(Freq)) %>% mutate(percent = Freq / `sum(Freq)`)head(bar_per) #write.csv(bar_per,file = "celltype_by_group_percent.csv")col =c("#3176B7","#F78000","#3FA116","#CE2820","#9265C1", "#885649","#DD76C5","#BBBE00","#41BED1")colnames(bar_per)library(ggthemes)p1 = ggplot(bar_per, aes(x = percent, y = Var1)) + geom_bar(aes(fill = Var2) , stat = "identity") + coord_flip() + theme(axis.ticks = element_line(linetype = "blank"), legend.position = "top", panel.grid.minor = element_line(colour = NA,linetype = "blank"), panel.background = element_rect(fill = NA), plot.background = element_rect(colour = NA)) + labs(y = " ", fill = NULL)+labs(x = 'Relative proportion(%)')+ scale_fill_manual(values=col)+ theme_few()+ theme(plot.title = element_text(size=12,hjust=0.5))p1
图片
p1tb=table(sce2$sample, sce2$score1)head(tb) malignant nonmalignant adjacent_nontumor 84 92 GC 2805 4 GC_liver_metastasis 39 0 GC_lymph_metastasis 82 0 GC_ovary_metastasis 578 0 GC_peritoneum_metastasis 258 0library (gplots) balloonplot(tb)
图片
bar_data <- as.data.frame(tb)bar_per <- bar_data %>% group_by(Var1) %>% mutate(sum(Freq)) %>% mutate(percent = Freq / `sum(Freq)`)head(bar_per) colnames(bar_per)library(ggthemes)p2 = ggplot(bar_per, aes(x = percent, y = Var1)) + geom_bar(aes(fill = Var2) , stat = "identity") + coord_flip() + theme(axis.ticks = element_line(linetype = "blank"), legend.position = "top", panel.grid.minor = element_line(colour = NA,linetype = "blank"), panel.background = element_rect(fill = NA), plot.background = element_rect(colour = NA)) + labs(y = " ", fill = NULL)+labs(x = 'Relative proportion(%)')+ scale_fill_manual(values=col)+ theme_few()+ theme(plot.title = element_text(size=12,hjust=0.5)) + theme(axis.text.x = element_text(angle = 45, hjust = 1))p2p1+p2ggsave(filename="prop.pdf",width = 12,height = 8)saveRDS(sce2,file = "tumorEpithelial.rds")
图片
p2图片
p1+p24.5 索求恶性上皮亚群在添加恶性评分之后,咱们不错在从上皮细胞中索求恶性上皮亚群,持续走Seurat V5圭臬进程:
table(sce2$score1)tumor = sce2[, sce2$score1 %in% c( 'malignant' )]set.seed(1234567)tumor[["RNA"]]$counts LayerData(tumor, assay = "RNA", layer = "counts")tumor <- JoinLayers(tumor)dim(tumor[["RNA"]]$counts )as.data.frame(tumor@assays$RNA$counts[1:10, 1:2])head(tumor@meta.data, 10)table(tumor$orig.ident) tumor <- NormalizeData(tumor, normalization.method = "LogNormalize", scale.factor = 1e4)GetAssay(tumor,assay = "RNA")tumor <- FindVariableFeatures(tumor, selection.method = "vst", nfeatures = 2000) tumor <- ScaleData(tumor) tumor <- RunPCA(object = tumor, pc.genes = VariableFeatures(tumor)) DimHeatmap(tumor, dims = 1:12, cells = 100, balanced = TRUE)ElbowPlot(tumor)
图片
tumor <- FindNeighbors(tumor, dims = 1:15)tumor <- FindClusters(tumor, resolution = 0.1)table(tumor@meta.data$RNA_snn_res.0.1) set.seed(321)tumor <- RunTSNE(object = tumor, dims = 1:15, do.fast = TRUE)tumor <- RunUMAP(object = tumor, dims = 1:5, do.fast = TRUE)mycolors <-c('#E64A35','#4DBBD4' ,'#01A187' ,'#3C5588' ,'#F29F80' , '#8491B6','#91D0C1','#7F5F48','#AF9E85','#4F4FFF','#CE3D33', '#739B57','#EFE685','#446983','#BB6239','#5DB1DC','#7F2268','#6BD66B','#800202','#D8D8CD','pink')p2 = DimPlot(tumor,reduction = "tsne",label=T,cols = mycolors)+ theme_few()p2
图片
p2保存恶性上皮细胞rds文献
table(tumor$seurat_clusters) 0 1 2 3 4 2275 579 482 314 196 tumor$celltype = paste0('G',tumor$seurat_clusters)table(tumor$celltype) G0 G1 G2 G3 G4 2275 579 482 314 196 saveRDS(tumor,file = "malignant.rds")4.6 富集分析
索求恶性上皮细胞之后,咱们接着从单细胞层面筛选出了特征基因,并在此基础上进行了富集分析。
领先读取恶性上皮细胞rds文献,加载R包:
#sce =readRDS('malignant.rds') sce = tumor###热图示意GO富集分析library(grid)library(ggplot2)library(gridExtra)library(clusterProfiler)library(org.Hs.eg.db)library(enrichplot)library(ggplot2)library(org.Hs.eg.db)library(GOplot)
然后使用Seurat内置函数FindAllMarkers筛选特征基因:
markers <- FindAllMarkers(object = sce, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)write.csv(markers,file='markers.csv')#markers = read.csv('markers.csv',row.names = 1)table(markers$cluster)library(dplyr) gene = markers[markers$p_val <0.01 & markers$avg_log2FC >2,]$geneentrezIDs = bitr(gene, fromType = "SYMBOL", toType = "ENTREZID", OrgDb= "org.Hs.eg.db", drop = TRUE)gene<- entrezIDs$ENTREZIDmarker_new = markers[markers$gene %in% entrezIDs$SYMBOL,]marker_new = marker_new[!duplicated(marker_new$gene),]identical(marker_new$gene, entrezIDs$SYMBOL)p = identical(marker_new$gene, entrezIDs$SYMBOL);pif(!p) entrezIDs = entrezIDs[match(marker_new$gene,entrezIDs$SYMBOL),]marker_new$ID = entrezIDs$ENTREZID
GO富集分析:
gcSample=split(marker_new$ID, marker_new$cluster) ###参数不错改换,望望?compareClusterxx <- compareCluster(gcSample, fun = "enrichGO", OrgDb = "org.Hs.eg.db", ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)p <- dotplot(xx)library(ggthemes)p + theme(axis.text.x = element_text( angle = 45, vjust = 0.5, hjust = 0.5))+theme_few()
图片
还不错使用热图展示GO富集:
a = xx@compareClusterResultcolnames(a)a$p = -log10(a$pvalue)b = a[,c(1,3,11)]library(reshape2)colnames(b)library(tidyr)wide_data <- pivot_wider(b, names_from = Cluster, values_from = p)wide_data = as.data.frame(wide_data)rownames(wide_data) = wide_data$Descriptionwide = wide_data[,-1]wide = wide[c(1:13),]rownames(wide) <- sapply(rownames(wide), function(x) paste0(substr(x, 1, 40), "..."))wide = as.matrix(wide)library(pheatmap)B <- pheatmap(wide, show_rownames = T, show_colnames = T, col = colorRampPalette(c("white","firebrick3"))(255), annotation_names_row = T,#不主张行注目信息 annotation_names_col = T ,#不主张列注目信息 column_title = NULL,#不主张列标题 cluster_rows = F, cluster_cols = F, scale = 'column', row_title = NULL)#不主张行标题scale = "row"B
图片
B结语本期,咱们笔据TCGA数据库中胃癌和普通胃组织之间的各异抒发基因,界说了每个上皮细胞的恶性和非恶性评分。下一期,咱们将在本期基础上,分析恶性上皮细胞G0-G4的Marker基因并绘画热图和小提琴图,此外,咱们还将使用AddModuleScore_UCell函数预备细胞的增殖和搬动评分。干货满满,接待群众捏续追更,谢谢!
图片
本站仅提供存储干事,所有实质均由用户发布,如发现存害或侵权实质,请点击举报。