在线影院胃癌单细胞数据集GSE163558复现(六)：调和TCGA-STAD数据添加恶性评分

第四色播室 NEWS

你的位置：调教小说 > 第四色播室 > 在线影院胃癌单细胞数据集GSE163558复现(六)：调和TCGA-STAD数据添加恶性评分

栏目分类

热点资讯

: 动漫英文湿毒：50天仅7天不下雨：湿毒达到顶峰的一个月，得多灸这仨穴位，绝顶是脾虚的东谈主

: 在线影院柔弱仙女白幼薇，竟成玩偶游戏通关王，与沈墨共赴生涯挑战！

: 在线影院天了噜，我思搭这座桥！

: 國產av 肛交两宋四川货币区（上）

: 亚洲色图偷拍自拍办公东说念主员必会汉字输入才智技艺不虞志的字何如打快速输入钱数日历# 汉字输入技

在线影院胃癌单细胞数据集GSE163558复现(六)：调和TCGA-STAD数据添加恶性评分

发布日期：2025-03-18 03:31 点击次数：54

在线影院胃癌单细胞数据集GSE163558复现(六)：调和TCGA-STAD数据添加恶性评分

图片在线影院在线影院

弁言

Hello小伙伴们群众好，我是生信妙技树的小学徒”我才不吃蛋黄“。今天是胃癌单细胞数据集GSE163558复现系列第六期。第五期咱们通过绘画饼图、堆积柱状图、气泡图等，相比不同分组之间细胞比例各异。本期，咱们将下载胃癌bulk数据（TCGA-STAD），筛选胃癌相干各异抒发基因，笔据各异基因list，使用AddModuleScore_UCell函数给上皮细胞亚群Seurat对象添加恶性评分，以此协助永诀普通上皮和恶性上皮。

1.配景先容

尽人皆知，胃癌细胞是胃上皮细胞发生了恶性病变。在第三期对细胞分群注目时，咱们并未对上皮细胞进行细分，这内部就包括非恶性上皮和恶性上皮细胞。那么该若何识别恶性上皮细胞？本文作家笔据TCGA数据库中胃癌和普通胃组织之间的各异抒发基因，界说了每个上皮细胞的恶性和非恶性评分，从而永诀恶性上皮和非恶性上皮细胞。虽然，咱们还不错诈欺copykat和infercnv从单细胞转录组数据估量肿瘤拷贝数变异，从而永诀上皮恶性与否（后期更新，敬请期待）。

2.数据下载

在素雅分析之前，咱们先从TCGA数据库赢得胃癌患者的抒发数据和临床数据。领先加载责任目次，创建新的文献夹，加载R包：

rm(list=ls())getwd()setwd("/home/data/t020505/GSE163558-GC代码公众号复现版")dir.create("6-TCGA_STAD")setwd('6-TCGA_STAD/')source('../scRNA_scripts/mycolors.R')library(tidyverse)library(data.table) #要是需要下载TCGA其他癌种，请更换projproj = "TCGA-STAD"dir.create("input")

在R中诈欺代码一键下载TCGA数据库胃癌患者的抒发数据及临床数据，并将下载好了数据存放在input文献夹：

if(T){  download.file(url = paste0("https://gdc.xenahubs.net/download/"，proj， ".htseq_counts.tsv.gz")，destfile = paste0("input/"，proj，".htseq_counts.tsv.gz"))  ##抒发数据  download.file(url = paste0("https://gdc.xenahubs.net/download/"，proj， ".GDC_phenotype.tsv.gz")，destfile = paste0("input/"，proj，".GDC_phenotype.tsv.gz")) ##临床数据  download.file(url = paste0("https://gdc.xenahubs.net/download/"，proj， ".survival.tsv")，destfile = paste0("input/"，proj，".survival.tsv")) ##生计数据}clinical = read.delim(paste0("input/"，proj，".GDC_phenotype.tsv.gz")，fill = T，header = T，sep = "\t")surv = read.delim(paste0("input/"，proj，".survival.tsv")，header = T) head(surv) #生计数据os和os.time

生计数据包括os和os.time：

图片

3.数据整理

鄙人载完数据之后，咱们需要对数据进行初步的整理。

3.1 措置抒发矩阵

领先措置抒发矩阵：

dat = read.table(paste0("input/"，proj，".htseq_counts.tsv.gz")，check.names = F，row.names = 1，header = T)dat <- data.table::fread(paste0("input/"，proj，".htseq_counts.tsv.gz")，                         data.table = F) #一起是symbola = dat[60483:60488，]dat = dat[1:60483，]rownames(dat) = dat$Ensembl_IDa = data = a[，-1]exp = a

去除在所有样本里抒发量皆为零的基因：

exp = exp[rowSums(exp)>0，]nrow(exp)

仅保留在一半以上样本里抒发的基因：

exp = exp[apply(exp， 1， function(x) sum(x > 0) > 0.5*ncol(exp))， ]nrow(exp)

抒发矩阵行名ID调度：

library(stringr)head(rownames(exp))library(AnnoProbe)rownames(exp) = substr(rownames(exp)， 1， 15)##rownames(exp) = str_split(rownames(exp)，"\\."，simplify = T)[，1];head(rownames(exp))#annoGene(rownames(exp)，ID_type = "ENSEMBL") re = annoGene(rownames(exp)，ID_type = "ENSEMBL");head(re)#if(!require(tinyarray))devtools::install_github("xjsun1221/tinyarray"，upgrade = FALSE，dependencies = TRUE)library(tinyarray)exp = trans_array(exp，ids = re，from = "ENSEMBL"，to = "SYMBOL")exp[1:4，1:4]

3.2 整理分组信息

然后整理分组信息:

笔据样本ID的第14-15位，给样天职组（tumor和normal）：

01A：癌症组织01B：福尔马林浸泡样本02A：复发组织06A：转机组织11A：癌旁组织

table(str_sub(colnames(exp)，14，15))  table(str_sub(colnames(exp)，14，16)) Group = ifelse(as.numeric(str_sub(colnames(exp)，14，15)) < 10，'tumor'，'normal')  Group = factor(Group，levels = c("normal"，"tumor"))table(Group)group = make_tcga_group(exp)table(group)

保存数据：

save(exp，Group，proj，clinical，surv，file = paste0(proj，".Rdata"))

4.数据分析4.1 TCGA各异分析

领先拆除系统环境变量，加载TCGA数据，加载各异分析R包“edgeR”：

rm(list = ls())load("TCGA-STAD.Rdata")table(Group)library(edgeR)

运转进行胃癌和癌旁组织各异基因分析：

dge <- DGEList(counts=exp，group=Group)dge$samples$lib.size <- colSums(dge$counts)dge <- calcNormFactors(dge) design <- model.matrix(~Group)dge <- estimateGLMCommonDisp(dge， design)dge <- estimateGLMTrendedDisp(dge， design)dge <- estimateGLMTagwiseDisp(dge， design)fit <- glmFit(dge， design)fit <- glmLRT(fit) DEG2=topTags(fit， n=Inf)class(DEG2)DEG2=as.data.frame(DEG2)head(DEG2)

图片

添加change列记号基因上调下调，在这里，咱们对各异分析筛选的阈值为：“| log2 fold change |>1， pvalue <0.05”，最终在胃癌中筛选到了2213个上调基因，142个下调基因。

logFC_t = 1pvalue_t = 0.05k1 = (DEG2$PValue < pvalue_t)&(DEG2$logFC < -logFC_t)k2 = (DEG2$PValue < pvalue_t)&(DEG2$logFC > logFC_t)DEG2$change = ifelse(k1，"DOWN"，ifelse(k2，"UP"，"NOT"))head(DEG2)table(DEG2$change)

图片

保存各异分析成果：

save(DEG2，Group，file = paste0(proj，"_DEG.Rdata"))

4.2.各异基因可视化

各异基因可视化聘请了热图和火山图：

library(ggplot2)library(tinyarray)dat = log2(cpm(exp)+1)pca.plot = draw_pca(dat，Group);pca.plotsave(pca.plot，file = paste0(proj，"_pcaplot.Rdata"))cg2 = rownames(DEG2)[DEG2$change !="NOT"]h2 = draw_heatmap(dat[cg2，]，Group)v2 = draw_volcano(DEG2，pkg = 2，logFC_cutoff = logFC_t)library(patchwork)h2 + v2 +plot_layout(guides = 'collect') &theme(legend.position = "none")ggsave(paste0(proj，"_heat_vo.png")，width = 16，height = 9)

图片

4.3 索求上皮细胞亚群

缔造立时种子，读取第三期细胞分群注目后的单细胞文献：

set.seed(12345)sce.all=readRDS( "../3-Celltype/sce_celltype.rds")table(sce.all$celltype)

索求出其中的上皮细胞：

sce1 = sce.all[， sce.all$celltype %in% c( 'Epithelial' )]

然后对上皮细胞亚群走Seurat V5圭臬进程（同第二期）：

names(sce1@assays$RNA@layers)sce1[["RNA"]]$counts LayerData(sce1， assay = "RNA"， layer = "counts")dim(sce1[["RNA"]]$counts )as.data.frame(sce1@assays$RNA$counts[1:10， 1:2])head(sce1@meta.data， 10)table(sce1$orig.ident) sce = sce1sce <- NormalizeData(sce， normalization.method =  "LogNormalize"，               scale.factor = 1e4)GetAssay(sce，assay = "RNA")sce <- FindVariableFeatures(sce， selection.method = "vst"，                            nfeatures = 2000)  sce <- ScaleData(sce) sce <- RunPCA(object = sce， pc.genes = VariableFeatures(sce)) DimHeatmap(sce， dims = 1:12， cells = 100， balanced = TRUE)ElbowPlot(sce)

图片

DimHeatmap

图片

ElbowPlot

分辨率缔造为0.1：

sce <- FindNeighbors(sce， dims = 1:15)sce <- FindClusters(sce， resolution = 0.1)table(sce@meta.data$RNA_snn_res.0.1)

数据降维：

set.seed(321)sce <- RunTSNE(object = sce， dims = 1:15， do.fast = TRUE)sce <- RunUMAP(object = sce， dims = 1:5， do.fast = TRUE)

分群可视化：

p = DimPlot(sce，reduction = "tsne"，label=T，cols = mycolors)

图片

然后保存上皮细胞rds文献：

saveRDS(sce，file = "Epithelial.rds")

4.4 AddModuleScore_UCell函数恶性评分

AddModuleScore_UCell函数不错达成基因齐集打分，通过评估单细胞数据齐集特定基因特征的抒发形状，咱们不错量化生物学信号的强度，并震动细胞的亚型和景色。在胃癌组织中取上调排行前50的各异基因用犯法性评分，下调排行前50的各异基因用作普通评分。

#sce = readRDS('Epithelial.rds')head(DEG2)table(DEG2$change)UP = DEG2[DEG2$change == 'UP' & DEG2$FDR <0.01，]DOWN = DEG2[DEG2$change == 'DOWN' & DEG2$FDR <0.01，]##用犯法性评分sorted_up = UP[order(UP$FDR)，]top_50_upgene = rownames(sorted_up[1:50， ]) ##用作普通评分sorted_down = DOWN[order(DOWN$FDR)，]top_50_downgene = rownames(sorted_down[1:50， ])library(UCell)marker <- list(top_50_upgene，top_50_downgene)#将基因整成listnames(marker)[1] <- 'tumor'names(marker)[2] <- 'normal'score <- AddModuleScore_UCell(sce，features=marker)

预备每个上皮细胞的恶性评分减去非恶性评分，并将评分从小到大排序以拟合滋长弧线。然后，笔据滋长弧线拐点隔邻的最大流弊，将细胞分为得分较高的恶性细胞（≥0.02）或得分较低的非恶性细胞（≤0.02）。

26xe

sce2 = scoredf = sce2@meta.datadf$score = df$tumor_UCell - df$normal_UCelltable(df$score)dim(sce)df$score1 = 'unknown'df[df$score > -0.02，]$score1 = 'malignant'table(df$score1)df[df$score < -0.02，]$score1 = 'nonmalignant'table(df$score1)sce2@meta.data = dfDimPlot(sce2， reduction = "tsne"，cols = mycolors，pt.size = 0.8，              group.by = "score1"，label = T，label.box = T)

图片

然后绘画气泡图和堆积性柱状图可视化分组细胞比例（同第五期胃癌单细胞数据集GSE163558复现(五)：细胞比例）：

tb=table(sce2$tissue， sce2$score1)head(tb)library (gplots) balloonplot(tb)

图片

bar_data <- as.data.frame(tb)bar_per <- bar_data %>%   group_by(Var1) %>%  mutate(sum(Freq)) %>%  mutate(percent = Freq / `sum(Freq)`)head(bar_per) #write.csv(bar_per，file = "celltype_by_group_percent.csv")col =c("#3176B7"，"#F78000"，"#3FA116"，"#CE2820"，"#9265C1"，       "#885649"，"#DD76C5"，"#BBBE00"，"#41BED1")colnames(bar_per)library(ggthemes)p1 = ggplot(bar_per， aes(x = percent， y = Var1)) +  geom_bar(aes(fill = Var2) ， stat = "identity") + coord_flip() +  theme(axis.ticks = element_line(linetype = "blank")，        legend.position = "top"，        panel.grid.minor = element_line(colour = NA，linetype = "blank")，         panel.background = element_rect(fill = NA)，        plot.background = element_rect(colour = NA)) +  labs(y = " "， fill = NULL)+labs(x = 'Relative proportion(%)')+  scale_fill_manual(values=col)+  theme_few()+  theme(plot.title = element_text(size=12，hjust=0.5))p1

图片

tb=table(sce2$sample， sce2$score1)head(tb)                                                     malignant nonmalignant  adjacent_nontumor               84           92  GC                            2805            4  GC_liver_metastasis             39            0  GC_lymph_metastasis             82            0  GC_ovary_metastasis            578            0  GC_peritoneum_metastasis       258            0library (gplots) balloonplot(tb)

图片

bar_data <- as.data.frame(tb)bar_per <- bar_data %>%   group_by(Var1) %>%  mutate(sum(Freq)) %>%  mutate(percent = Freq / `sum(Freq)`)head(bar_per) colnames(bar_per)library(ggthemes)p2 = ggplot(bar_per， aes(x = percent， y = Var1)) +  geom_bar(aes(fill = Var2) ， stat = "identity") + coord_flip() +  theme(axis.ticks = element_line(linetype = "blank")，        legend.position = "top"，        panel.grid.minor = element_line(colour = NA，linetype = "blank")，         panel.background = element_rect(fill = NA)，        plot.background = element_rect(colour = NA)) +  labs(y = " "， fill = NULL)+labs(x = 'Relative proportion(%)')+  scale_fill_manual(values=col)+  theme_few()+  theme(plot.title = element_text(size=12，hjust=0.5)) +   theme(axis.text.x = element_text(angle = 45， hjust = 1))p2p1+p2ggsave(filename="prop.pdf"，width = 12，height = 8)saveRDS(sce2，file = "tumorEpithelial.rds")

图片

p1+p24.5 索求恶性上皮亚群

在添加恶性评分之后，咱们不错在从上皮细胞中索求恶性上皮亚群，持续走Seurat V5圭臬进程：

table(sce2$score1)tumor = sce2[， sce2$score1 %in% c( 'malignant' )]set.seed(1234567)tumor[["RNA"]]$counts LayerData(tumor， assay = "RNA"， layer = "counts")tumor <- JoinLayers(tumor)dim(tumor[["RNA"]]$counts )as.data.frame(tumor@assays$RNA$counts[1:10， 1:2])head(tumor@meta.data， 10)table(tumor$orig.ident) tumor <- NormalizeData(tumor， normalization.method =  "LogNormalize"，                         scale.factor = 1e4)GetAssay(tumor，assay = "RNA")tumor <- FindVariableFeatures(tumor，                               selection.method = "vst"， nfeatures = 2000)  tumor <- ScaleData(tumor) tumor <- RunPCA(object = tumor， pc.genes = VariableFeatures(tumor)) DimHeatmap(tumor， dims = 1:12， cells = 100， balanced = TRUE)ElbowPlot(tumor)

图片

tumor <- FindNeighbors(tumor， dims = 1:15)tumor <- FindClusters(tumor， resolution = 0.1)table(tumor@meta.data$RNA_snn_res.0.1)  set.seed(321)tumor <- RunTSNE(object = tumor， dims = 1:15， do.fast = TRUE)tumor <- RunUMAP(object = tumor， dims = 1:5， do.fast = TRUE)mycolors <-c('#E64A35'，'#4DBBD4' ，'#01A187'  ，'#3C5588'  ，'#F29F80'  ，             '#8491B6'，'#91D0C1'，'#7F5F48'，'#AF9E85'，'#4F4FFF'，'#CE3D33'，             '#739B57'，'#EFE685'，'#446983'，'#BB6239'，'#5DB1DC'，'#7F2268'，'#6BD66B'，'#800202'，'#D8D8CD'，'pink')p2 = DimPlot(tumor，reduction = "tsne"，label=T，cols = mycolors)+  theme_few()p2

图片

保存恶性上皮细胞rds文献

table(tumor$seurat_clusters)   0    1    2    3    4 2275  579  482  314  196 tumor$celltype = paste0('G'，tumor$seurat_clusters)table(tumor$celltype)  G0   G1   G2   G3   G4 2275  579  482  314  196 saveRDS(tumor，file = "malignant.rds")

4.6 富集分析

索求恶性上皮细胞之后，咱们接着从单细胞层面筛选出了特征基因，并在此基础上进行了富集分析。

领先读取恶性上皮细胞rds文献，加载R包：

#sce =readRDS('malignant.rds') sce = tumor###热图示意GO富集分析library(grid)library(ggplot2)library(gridExtra)library(clusterProfiler)library(org.Hs.eg.db)library(enrichplot)library(ggplot2)library(org.Hs.eg.db)library(GOplot)

然后使用Seurat内置函数FindAllMarkers筛选特征基因：

markers <- FindAllMarkers(object = sce， only.pos = TRUE，                           min.pct = 0.25，                           thresh.use = 0.25)write.csv(markers，file='markers.csv')#markers = read.csv('markers.csv'，row.names = 1)table(markers$cluster)library(dplyr) gene = markers[markers$p_val <0.01 & markers$avg_log2FC >2，]$geneentrezIDs = bitr(gene， fromType = "SYMBOL"， toType = "ENTREZID"， OrgDb= "org.Hs.eg.db"， drop = TRUE)gene<- entrezIDs$ENTREZIDmarker_new = markers[markers$gene %in% entrezIDs$SYMBOL，]marker_new = marker_new[!duplicated(marker_new$gene)，]identical(marker_new$gene， entrezIDs$SYMBOL)p = identical(marker_new$gene， entrezIDs$SYMBOL);pif(!p) entrezIDs = entrezIDs[match(marker_new$gene，entrezIDs$SYMBOL)，]marker_new$ID = entrezIDs$ENTREZID

GO富集分析：

gcSample=split(marker_new$ID， marker_new$cluster) ###参数不错改换，望望?compareClusterxx <- compareCluster(gcSample，                     fun = "enrichGO"，                     OrgDb = "org.Hs.eg.db"，                     ont = "BP"，                     pAdjustMethod = "BH"，                     pvalueCutoff = 0.01，                     qvalueCutoff = 0.05)p <- dotplot(xx)library(ggthemes)p + theme(axis.text.x = element_text(  angle = 45，  vjust = 0.5， hjust = 0.5))+theme_few()

图片

还不错使用热图展示GO富集：

a = xx@compareClusterResultcolnames(a)a$p = -log10(a$pvalue)b = a[，c(1，3，11)]library(reshape2)colnames(b)library(tidyr)wide_data <- pivot_wider(b， names_from = Cluster， values_from = p)wide_data = as.data.frame(wide_data)rownames(wide_data) = wide_data$Descriptionwide = wide_data[，-1]wide = wide[c(1:13)，]rownames(wide) <- sapply(rownames(wide)， function(x) paste0(substr(x， 1， 40)， "..."))wide = as.matrix(wide)library(pheatmap)B <- pheatmap(wide， show_rownames = T，              show_colnames = T，              col = colorRampPalette(c("white"，"firebrick3"))(255)，              annotation_names_row = T，#不主张行注目信息              annotation_names_col = T ，#不主张列注目信息              column_title = NULL，#不主张列标题              cluster_rows = F，              cluster_cols = F，              scale = 'column'，              row_title = NULL)#不主张行标题scale = "row"B

图片

B结语

本期，咱们笔据TCGA数据库中胃癌和普通胃组织之间的各异抒发基因，界说了每个上皮细胞的恶性和非恶性评分。下一期，咱们将在本期基础上，分析恶性上皮细胞G0-G4的Marker基因并绘画热图和小提琴图，此外，咱们还将使用AddModuleScore_UCell函数预备细胞的增殖和搬动评分。干货满满，接待群众捏续追更，谢谢！

图片

本站仅提供存储干事，所有实质均由用户发布，如发现存害或侵权实质，请点击举报。

上一篇：在线影院 240003

下一篇：伪娘拳交怎样寻找企业软文发稿渠谈？全目的的渠谈保举与学习指南

第四色播室