Cli Guide¶
scmags -h
command to see the cli arguments and descriptions.[1]:
%%bash
scmags -h
usage: scmags [-h] [-genann] [-v] [-sep] [-head] [-incol] [-thr] [-im] [-nsel]
[-nolnorm] [-nmark] [-ncore] [-dyn] [-dot] [-tsne] [-heat]
[-knn] [-t] [-wrt]
data labels
|---- Arguments ----|
positional arguments:
data Input data must be in .csv format. Also rows should
correspond to cells and columns to genes. If the
reveerse is true, use the transpose (-t --transpose)
option
labels Cluster labels must be in .csv format and match the
numberof cells.
optional arguments:
-h, --help show this help message and exit
-genann Gene Annotation file (default: None)
-v, --verbose Transpose input matrix (default: False)
-sep , --readseperator
Delimiter to use. (default: , )
-head , --header If header are on line 1, set it to 0. Set to None if
headers are not available. (default: 0))
-incol , --indexcol Set to 0 if the row indexes are in the 1st column. If
the index does not exist, set it to None. (default:
0))
-thr , --expthres Intra-cluster expression threshold to be used in gene
filtering (default: None)
-im , --imexp Significance of out-of-cluster expression rate
(default: 10))
-nsel , --nofsel Number of genes remaining for each cluster after
filtering (default: 10))
-nolnorm, --nolognorm
Log normalization status in gene filtering (default:
True)
-nmark , --nofmarkers
Number of markers to be selected for each cluster
(default: 5))
-ncore , --nofcores Number of cores to use (default: -2))
-dyn, --dynprog Dynamic programming option for gene selection
(default: False)
-dot, --dotplot Dot plotting status (default: False)
-tsne, --marktsne T-SNE plotting status (default: False)
-heat, --markheat Heatmap plotting status (default: False)
-knn, --knnconf K-NN classiification and confusion matrix plotting
status (default: False)
-t, --transpose If the data matrix is gene*cell, use (default: False)
-wrt, --writeres If you want to print the results to the directory
where the data is, use (default: False)
Data
and labels
must be in .csv
format. If you want to give gene names
, the room should be in .csv
format.[2]:
%%bash
cd Pollen
ls
Pollen_Data.csv
Pollen_Data_markers_res_ann.csv
Pollen_Data_markers_res_ind.csv
Pollen_Gene_Ann.csv
Pollen_Labels.csv
In the data set, rows should correspond to cells and columns should correspond to genes. That’s why we used the -t
argument here
[3]:
%%bash
cd Pollen
scmags Pollen_Data.csv Pollen_Labels.csv -t -v
-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting markers for each cluster
Marker_1 Marker_2 Marker_3 Marker_4 Marker_5
C_Hi_2338 Gene_9590 Gene_9587 Gene_18197 Gene_6765 Gene_7636
C_Hi_2339 Gene_5452 Gene_3074 Gene_8163 Gene_4461 Gene_3121
C_Hi_BJ Gene_3870 Gene_4560 Gene_7639 Gene_21444 Gene_11528
C_Hi_GW16 Gene_14484 Gene_3258 Gene_18602 Gene_6788 Gene_3325
C_Hi_GW21 Gene_7646 Gene_16255 Gene_11768 Gene_10727 Gene_14025
C_Hi_HL60 Gene_13612 Gene_16874 Gene_16683 Gene_3298 Gene_13747
C_Hi_K562 Gene_7897 Gene_7008 Gene_7898 Gene_7074 Gene_17624
C_Hi_Kera Gene_6529 Gene_9572 Gene_17013 Gene_804 Gene_805
C_Hi_NPC Gene_3855 Gene_11381 Gene_3878 Gene_15372 Gene_21221
C_Hi_iPS Gene_4022 Gene_9819 Gene_10525 Gene_17332 Gene_19200
If you are not working in the location of the data, you can give the arguments as path.
[4]:
%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv -t -v
-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting markers for each cluster
Marker_1 Marker_2 Marker_3 Marker_4 Marker_5
C_Hi_2338 Gene_9590 Gene_9587 Gene_18197 Gene_6765 Gene_7636
C_Hi_2339 Gene_5452 Gene_3074 Gene_8163 Gene_4461 Gene_3121
C_Hi_BJ Gene_3870 Gene_4560 Gene_7639 Gene_21444 Gene_11528
C_Hi_GW16 Gene_14484 Gene_3258 Gene_18602 Gene_6788 Gene_3325
C_Hi_GW21 Gene_7646 Gene_16255 Gene_11768 Gene_10727 Gene_14025
C_Hi_HL60 Gene_13612 Gene_16874 Gene_16683 Gene_3298 Gene_13747
C_Hi_K562 Gene_7897 Gene_7008 Gene_7898 Gene_7074 Gene_17624
C_Hi_Kera Gene_6529 Gene_9572 Gene_17013 Gene_804 Gene_805
C_Hi_NPC Gene_3855 Gene_11381 Gene_3878 Gene_15372 Gene_21221
C_Hi_iPS Gene_4022 Gene_9819 Gene_10525 Gene_17332 Gene_19200
If you want, you can save your results in .csv
format with -wrt
command. Results are saved as two files. One has selected gene names and the other has matrix indices.
[5]:
%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv -t -wrt
ls Pollen
Pollen_Data.csv
Pollen_Data_markers_res_ann.csv
Pollen_Data_markers_res_ind.csv
Pollen_Gene_Ann.csv
Pollen_Labels.csv
You can also change the number of genes remaining after filtering and the number of markers to be selected.
Note
If you are going to increase the number of markers, make sure that the number of genes remaining after filtering is more than the number of markers to be selected.
[6]:
%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv -t -v -nsel 20 -nmark 10
-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting markers for each cluster
Marker_1 Marker_2 ... Marker_9 Marker_10
C_Hi_2338 Gene_9590 Gene_17102 ... Gene_3616 Gene_9537
C_Hi_2339 Gene_5452 Gene_13740 ... Gene_3105 Gene_8165
C_Hi_BJ Gene_3870 Gene_4560 ... Gene_6719 Gene_1584
C_Hi_GW16 Gene_14484 Gene_3258 ... Gene_17020 Gene_22028
C_Hi_GW21 Gene_7646 Gene_16255 ... Gene_18836 Gene_3482
C_Hi_HL60 Gene_13612 Gene_16874 ... Gene_9083 Gene_466
C_Hi_K562 Gene_7897 Gene_7008 ... Gene_12811 Gene_5127
C_Hi_Kera Gene_6529 Gene_9536 ... Gene_805 Gene_803
C_Hi_NPC Gene_10156 Gene_3855 ... Gene_9888 Gene_16873
C_Hi_iPS Gene_4022 Gene_9819 ... Gene_601 Gene_21753
[10 rows x 10 columns]
You can also visualize markers.
[7]:
%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv -t -dot
Figure(600x1400)
Warning
The cli part of the package is under development. If the data differs from the requested format, errors may occur. A few arguments related to reading data have been added, but errors may still occur.