Guidance - scHNDB

Overview

Welcome to the scHNDB (Single-cell and Spatial Transcriptomic RNA-seq Database of Head and Neck Squamous Cell Carcinoma) user guidance. This guide will help you navigate the database, search for data, and download datasets for your research.

scHNDB provides comprehensive single-cell RNA-seq and spatial transcriptomics data from head and neck cancer research. The database is organized into multiple sections to facilitate easy access to data based on your research needs.

The website is a static portal: pages are pre-built; interactive charts and search tools run in your browser and load supporting JSON or images from the site. You need a modern browser and a stable network connection. Large matrix downloads are hosted externally on Zenodo (open access); follow the Zenodo links on each Resource page.

Database Structure

scHNDB is organized into the following main areas:

1. Search

The Search section provides interactive exploration and visualization. It includes these sub-pages:

Dataset Information: Browse sample-level metadata (demographics, staging, tissue, virus status, etc.), filter the table, and view summary charts (static or interactive).
Celltype Annotation: Explore cell type annotations and UMAP-style views across integrated data.
Malignant Trajectory: Browse pseudotime-related figures and resources for malignant cell trajectories (including cell-lineage subpages where available).
Treatment Target: Explore treatment-related targets and supporting views derived from single-cell analysis.
Spatial Interaction: Access spatial deconvolution–related views and cell–cell interaction (CCI) resources where provided.

2. Resource

The Resource section is the main entry for downloading processed 10X-style packages and related files. From the Resource landing page you can also open the combined 10X Zenodo package (when applicable). Subsections:

Dataset: Per-sample or per-study entries with descriptions and download links to raw/processed packages.
Tissue Type: Data aggregated by tissue source (e.g., Tumor, Normal, OPMD, PBMC, Lymph node, Cell line).
Cell Type: Data aggregated by major cell populations: Epithelial, Fibroblast, Endothelial, Myeloid, NK, B/Plasma, and T cells—each has its own detail page.
Treatment: Treatment-stratified or treatment-associated processed resources, where available.

Typical downloads include a 10X raw package (ZIP) plus separate barcodes, features, matrix, and metadata files where listed. Exact filenames are shown on each page.

3. Guidance

This page—usage orientation, download workflow, and analysis tips.

4. About Us

Project information, team, and contact details.

How to Search Data

Using Dataset Information

1 Navigate to Search → Dataset Information

2 Use the filter controls at the top of the page to narrow down datasets:

Select tissue type (Tumor, Normal, OPMD, etc.)
Filter by patient age range
Choose gender
Select anatomical position
Filter by virus status (HPV, EBV, etc.)
Choose clinical stage or TNM staging

3 View the filtered results in the data table below

4 Switch between visualization tabs to see data distributions

5 Toggle between static charts and interactive charts using the mode selector

💡 Tip: Interactive charts allow you to zoom, pan, and download visualizations as PNG images for your publications.

Exploring Cell Type Annotations

1 Navigate to Search → Celltype Annotation

2 View UMAP visualizations showing the spatial distribution of different cell types

3 Examine the cell type composition across different samples

4 Use this information to identify datasets with specific cell populations of interest

Malignant Trajectory, Treatment Target & Spatial Interaction

Open Search → Malignant Trajectory, Search → Treatment Target, or Search → Spatial Interaction. Use the on-page controls (tabs, filters, cards, or gene/search fields where available) to navigate subviews. These tools rely on pre-generated images and JSON manifests under the site; allow time for assets to load on first visit.

How to Download Data

Understanding Data Files

Each dataset or category in the Resource section provides four types of files:

Barcode file (barcodes.tsv.gz): Contains unique cell barcode identifiers
Feature file (features.tsv.gz): Contains gene names and annotations
Matrix file (matrix.mtx.gz): Contains the gene expression matrix in sparse MTX format
Metadata file (metadata.csv.gz): Contains cell-level annotations including cell types, sample information, and quality metrics

Download by Dataset

1 Navigate to Resource → Dataset

2 Browse the available datasets and read their descriptions

3 Click on a dataset card to view detailed information

4 Download the required files (barcode, feature, matrix, and metadata)

5 Extract the compressed files to your local directory

💡 Recommended: Download all four files for each dataset to ensure you have complete information for analysis.

Download by Tissue Type

If you're interested in specific tissue types:

1 Navigate to Resource → Tissue Type

2 Select your tissue type of interest (e.g., Tumor, Normal, PBMC)

3 Each tissue type page shows the number of cells and samples available

4 Download the integrated data files for that tissue type

Download by Cell Type

If you're focusing on specific cell populations:

1 Navigate to Resource → Cell Type

2 Choose your cell type of interest (e.g., T Cell, Myeloid Cell)

3 Review the subtypes included in each category

4 Download the processed data for your cell type of interest. The B/Plasma cell type has a dedicated page alongside T, NK, Myeloid, Epithelial, Endothelial, and Fibroblast.

⚠️ Note: Ensure you have sufficient storage space before downloading. Matrix files can be large (several hundred MB to GB).

Combined 10X package

On the main Resource landing page, use the highlighted link to the combined 10X Zenodo record when you need a single entry point for a full combined matrix and metadata package (see on-page title and description).

Marker Gene Tables

Marker tables (typically .xls) and, for cell types, published reference gene sets (.zip on Zenodo), are not bundled inside the 10X ZIP on the Resource pages. They are published as separate open datasets:

Tissue-stratified markers: Zenodo record scHNDB-Tissue Type-Marker Genes (DOI 10.5281/zenodo.19535702). Download the files that match your tissue (e.g., Tumor, Normal, PBMC) from the record file list.
Cell-type markers & reference gene sets: Zenodo record scHNDB-Cell Type-Marker Genes (DOI 10.5281/zenodo.19535753). Filenames follow the pattern for each lineage (e.g., T_cell_*, B_Plasma_*, Fibroblast_Myocyte_* where applicable).

💡 Tip: Always use the Zenodo landing page linked from the Resource subpage so you pick the correct file name and version for your tissue or cell type.

Data Analysis

Loading Data in Seurat (R)

Once you've downloaded the data files, you can load them into Seurat for analysis:

# Load required library
library(Seurat)

Read 10X format data
data <- Read10X(data.dir = “path/to/downloaded/files”)

Create Seurat object
seurat_obj <- CreateSeuratObject(counts = data, project = “scHNDB”)

Load metadata

metadata <- read.csv(“path/to/metadata.csv.gz”) seurat_obj <- AddMetaData(seurat_obj, metadata)

Loading Data in Scanpy (Python)

For Python users, use Scanpy to load the data:

# Load required library
import scanpy as sc
import pandas as pd

Read 10X format data
adata = sc.read_10x_mtx(‘path/to/downloaded/files’)

Load metadata

metadata = pd.read_csv(‘path/to/metadata.csv.gz’) adata.obs = metadata

Understanding Metadata

The metadata file contains important information about each cell:

Cell barcode: Unique identifier for each cell
Sample ID: Which sample/patient the cell comes from
Cell type: Annotated cell type (e.g., T cell, Epithelial cell)
Tissue type: Source tissue (e.g., Tumor, Normal)
Quality metrics: nCount_RNA, nFeature_RNA, percent.mt, etc.
Additional annotations: May include subtype information, clustering results, etc.

Best Practices

Before Downloading

Explore the data first: Use the Search section to understand the available datasets and their characteristics
Check sample sizes: Review the number of cells and samples to ensure they meet your research needs
Read descriptions: Understand what each dataset contains before downloading
Plan storage: Ensure you have adequate disk space for large files

During Analysis

Quality control: Always perform quality control on downloaded data, even though it’s pre-processed
Batch effects: Be aware of potential batch effects when combining multiple datasets
Metadata utilization: Make full use of the provided metadata for your analysis
Normalization: Consider re-normalizing data if combining with your own datasets

Citation

When using scHNDB data in your research, please cite the scHNDB / Peng Lab publication when available, and also cite the Zenodo DOI for any downloaded package (combined 10X, tissue-level, cell-type, or marker-gene deposit) so file version and access date are clear.

[Insert full paper citation and Zenodo DOIs as finalized by the project.]

Troubleshooting

Common Issues

Q: The downloaded files won’t extract

A: Ensure you have the appropriate decompression software (e.g., gzip, 7-Zip)
Try using command line: gunzip filename.gz

Q: Data won’t load in Seurat/Scanpy

A: Verify all three files (barcodes, features, matrix) are in the same directory
Check that file names match the expected format
Ensure files are properly decompressed

Q: Metadata doesn’t match the expression matrix

A: Verify you downloaded all files from the same dataset/category
Check that cell barcodes in metadata match those in the barcode file

Q: Download links are not working

A: Most packages are hosted on Zenodo; ensure your network allows access to zenodo.org (and check institutional firewalls).
Open the Zenodo page in a new tab and download manually if the in-page button is blocked by the browser.
For mirror options (e.g., China OSS), follow updates on the Resource page when available.
Contact us if the issue persists: cainzhi@foxmail.com

Q: Interactive Search page is slow or images missing

A: Clear cache, try another browser, and check that scripts and JSON paths are not blocked. Large manifests may take a few seconds on first load.

Getting Help

If you encounter issues not covered in this guide:

Check our FAQ section (if available)
Contact us via email: cainzhi@foxmail.com
Report issues on our GitHub repository

Updates and Maintenance

scHNDB is regularly updated with new datasets and features. Check back periodically for:

New sample datasets
Additional cell type annotations
Enhanced visualization tools
Updated analysis pipelines

📅 Last Updated: April 2026

📚 User Guidance

Overview

Database Structure

1. Search

2. Resource

3. Guidance

4. About Us

How to Search Data

Using Dataset Information

Exploring Cell Type Annotations

Malignant Trajectory, Treatment Target & Spatial Interaction

How to Download Data

Understanding Data Files

Download by Dataset

Download by Tissue Type

Download by Cell Type

Combined 10X package

Marker Gene Tables

Data Analysis

Loading Data in Seurat (R)

Read 10X format data

Create Seurat object

Load metadata

Loading Data in Scanpy (Python)

Read 10X format data

Load metadata

Understanding Metadata

Best Practices

Before Downloading

During Analysis

Citation

Troubleshooting

Common Issues

Getting Help

Updates and Maintenance