📚 User Guidance
Complete guide to using scHNDB database and downloading data
Overview
Welcome to the scHNDB (Single-cell and Spatial Transcriptomic RNA-seq Database of Head and Neck Squamous Cell Carcinoma) user guidance. This guide will help you navigate the database, search for data, and download datasets for your research.
scHNDB provides comprehensive single-cell RNA-seq and spatial transcriptomics data from head and neck cancer research. The database is organized into multiple sections to facilitate easy access to data based on your research needs.
The website is a static portal: pages are pre-built; interactive charts and search tools run in your browser and load supporting JSON or images from the site. You need a modern browser and a stable network connection. Large matrix downloads are hosted externally on Zenodo (open access); follow the Zenodo links on each Resource page.
Database Structure
scHNDB is organized into the following main areas:
1. Search
The Search section provides interactive exploration and visualization. It includes these sub-pages:
- Dataset Information: Browse sample-level metadata (demographics, staging, tissue, virus status, etc.), filter the table, and view summary charts (static or interactive).
- Celltype Annotation: Explore cell type annotations and UMAP-style views across integrated data.
- Malignant Trajectory: Browse pseudotime-related figures and resources for malignant cell trajectories (including cell-lineage subpages where available).
- Treatment Target: Explore treatment-related targets and supporting views derived from single-cell analysis.
- Spatial Interaction: Access spatial deconvolution–related views and cell–cell interaction (CCI) resources where provided.
2. Resource
The Resource section is the main entry for downloading processed 10X-style packages and related files. From the Resource landing page you can also open the combined 10X Zenodo package (when applicable). Subsections:
- Dataset: Per-sample or per-study entries with descriptions and download links to raw/processed packages.
- Tissue Type: Data aggregated by tissue source (e.g., Tumor, Normal, OPMD, PBMC, Lymph node, Cell line).
- Cell Type: Data aggregated by major cell populations: Epithelial, Fibroblast, Endothelial, Myeloid, NK, B/Plasma, and T cells—each has its own detail page.
- Treatment: Treatment-stratified or treatment-associated processed resources, where available.
Typical downloads include a 10X raw package (ZIP) plus separate barcodes, features, matrix, and metadata files where listed. Exact filenames are shown on each page.
3. Guidance
This page—usage orientation, download workflow, and analysis tips.
4. About Us
Project information, team, and contact details.
How to Search Data
Using Dataset Information
1 Navigate to Search → Dataset Information
2 Use the filter controls at the top of the page to narrow down datasets:
- Select tissue type (Tumor, Normal, OPMD, etc.)
- Filter by patient age range
- Choose gender
- Select anatomical position
- Filter by virus status (HPV, EBV, etc.)
- Choose clinical stage or TNM staging
3 View the filtered results in the data table below
4 Switch between visualization tabs to see data distributions
5 Toggle between static charts and interactive charts using the mode selector
Exploring Cell Type Annotations
1 Navigate to Search → Celltype Annotation
2 View UMAP visualizations showing the spatial distribution of different cell types
3 Examine the cell type composition across different samples
4 Use this information to identify datasets with specific cell populations of interest
Malignant Trajectory, Treatment Target & Spatial Interaction
Open Search → Malignant Trajectory, Search → Treatment Target, or Search → Spatial Interaction. Use the on-page controls (tabs, filters, cards, or gene/search fields where available) to navigate subviews. These tools rely on pre-generated images and JSON manifests under the site; allow time for assets to load on first visit.
How to Download Data
Understanding Data Files
Each dataset or category in the Resource section provides four types of files:
- Barcode file (
barcodes.tsv.gz): Contains unique cell barcode identifiers - Feature file (
features.tsv.gz): Contains gene names and annotations - Matrix file (
matrix.mtx.gz): Contains the gene expression matrix in sparse MTX format - Metadata file (
metadata.csv.gz): Contains cell-level annotations including cell types, sample information, and quality metrics
Download by Dataset
1 Navigate to Resource → Dataset
2 Browse the available datasets and read their descriptions
3 Click on a dataset card to view detailed information
4 Download the required files (barcode, feature, matrix, and metadata)
5 Extract the compressed files to your local directory
Download by Tissue Type
If you're interested in specific tissue types:
1 Navigate to Resource → Tissue Type
2 Select your tissue type of interest (e.g., Tumor, Normal, PBMC)
3 Each tissue type page shows the number of cells and samples available
4 Download the integrated data files for that tissue type
Download by Cell Type
If you're focusing on specific cell populations:
1 Navigate to Resource → Cell Type
2 Choose your cell type of interest (e.g., T Cell, Myeloid Cell)
3 Review the subtypes included in each category
4 Download the processed data for your cell type of interest. The B/Plasma cell type has a dedicated page alongside T, NK, Myeloid, Epithelial, Endothelial, and Fibroblast.
Combined 10X package
On the main Resource landing page, use the highlighted link to the combined 10X Zenodo record when you need a single entry point for a full combined matrix and metadata package (see on-page title and description).
Marker Gene Tables
Marker tables (typically .xls) and, for cell types, published reference gene sets (.zip on Zenodo), are not bundled inside the 10X ZIP on the Resource pages. They are published as separate open datasets:
- Tissue-stratified markers: Zenodo record scHNDB-Tissue Type-Marker Genes (DOI 10.5281/zenodo.19535702). Download the files that match your tissue (e.g., Tumor, Normal, PBMC) from the record file list.
- Cell-type markers & reference gene sets: Zenodo record scHNDB-Cell Type-Marker Genes (DOI 10.5281/zenodo.19535753). Filenames follow the pattern for each lineage (e.g.,
T_cell_*,B_Plasma_*,Fibroblast_Myocyte_*where applicable).
Data Analysis
Loading Data in Seurat (R)
Once you've downloaded the data files, you can load them into Seurat for analysis:
# Load required library
library(Seurat)
Read 10X format data
data <- Read10X(data.dir = “path/to/downloaded/files”)
Create Seurat object
seurat_obj <- CreateSeuratObject(counts = data, project = “scHNDB”)
Load metadata
metadata <- read.csv(“path/to/metadata.csv.gz”)
seurat_obj <- AddMetaData(seurat_obj, metadata)
Loading Data in Scanpy (Python)
For Python users, use Scanpy to load the data:
# Load required library
import scanpy as sc
import pandas as pd
Read 10X format data
adata = sc.read_10x_mtx(‘path/to/downloaded/files’)
Load metadata
metadata = pd.read_csv(‘path/to/metadata.csv.gz’)
adata.obs = metadata
Understanding Metadata
The metadata file contains important information about each cell:
- Cell barcode: Unique identifier for each cell
- Sample ID: Which sample/patient the cell comes from
- Cell type: Annotated cell type (e.g., T cell, Epithelial cell)
- Tissue type: Source tissue (e.g., Tumor, Normal)
- Quality metrics: nCount_RNA, nFeature_RNA, percent.mt, etc.
- Additional annotations: May include subtype information, clustering results, etc.
Best Practices
Before Downloading
- Explore the data first: Use the Search section to understand the available datasets and their characteristics
- Check sample sizes: Review the number of cells and samples to ensure they meet your research needs
- Read descriptions: Understand what each dataset contains before downloading
- Plan storage: Ensure you have adequate disk space for large files
During Analysis
- Quality control: Always perform quality control on downloaded data, even though it’s pre-processed
- Batch effects: Be aware of potential batch effects when combining multiple datasets
- Metadata utilization: Make full use of the provided metadata for your analysis
- Normalization: Consider re-normalizing data if combining with your own datasets
Citation
When using scHNDB data in your research, please cite the scHNDB / Peng Lab publication when available, and also cite the Zenodo DOI for any downloaded package (combined 10X, tissue-level, cell-type, or marker-gene deposit) so file version and access date are clear.
Troubleshooting
Common Issues
Q: The downloaded files won’t extract
- A: Ensure you have the appropriate decompression software (e.g., gzip, 7-Zip)
- Try using command line:
gunzip filename.gz
Q: Data won’t load in Seurat/Scanpy
- A: Verify all three files (barcodes, features, matrix) are in the same directory
- Check that file names match the expected format
- Ensure files are properly decompressed
Q: Metadata doesn’t match the expression matrix
- A: Verify you downloaded all files from the same dataset/category
- Check that cell barcodes in metadata match those in the barcode file
Q: Download links are not working
- A: Most packages are hosted on Zenodo; ensure your network allows access to
zenodo.org(and check institutional firewalls). - Open the Zenodo page in a new tab and download manually if the in-page button is blocked by the browser.
- For mirror options (e.g., China OSS), follow updates on the Resource page when available.
- Contact us if the issue persists: cainzhi@foxmail.com
Q: Interactive Search page is slow or images missing
- A: Clear cache, try another browser, and check that scripts and JSON paths are not blocked. Large manifests may take a few seconds on first load.
Getting Help
If you encounter issues not covered in this guide:
- Check our FAQ section (if available)
- Contact us via email: cainzhi@foxmail.com
- Report issues on our GitHub repository
Updates and Maintenance
scHNDB is regularly updated with new datasets and features. Check back periodically for:
- New sample datasets
- Additional cell type annotations
- Enhanced visualization tools
- Updated analysis pipelines
