Cli Guide¶

It also provides a command line interface for users who want to use the scmags package.

This tutorial shows you how to use the scmags package from the terminal.

You can use the scmags -h command to see the cli arguments and descriptions.

Data and set labels are mandatory arguments. You can optionally use the remaining arguments.

[1]:

%%bash
scmags -h

usage: scmags [-h] [-genann] [-v] [-sep] [-head] [-incol] [-thr] [-im] [-nsel]
              [-nolnorm] [-nmark] [-ncore] [-dyn] [-dot] [-tsne] [-heat]
              [-knn] [-t] [-wrt]
              data labels

|---- Arguments ----|

positional arguments:
  data                  Input data must be in .csv format. Also rows should
                        correspond to cells and columns to genes. If the
                        reveerse is true, use the transpose (-t --transpose)
                        option
  labels                Cluster labels must be in .csv format and match the
                        numberof cells.

optional arguments:
  -h, --help            show this help message and exit
  -genann               Gene Annotation file (default: None)
  -v, --verbose         Transpose input matrix (default: False)
  -sep , --readseperator
                        Delimiter to use. (default: , )
  -head , --header      If header are on line 1, set it to 0. Set to None if
                        headers are not available. (default: 0))
  -incol , --indexcol   Set to 0 if the row indexes are in the 1st column. If
                        the index does not exist, set it to None. (default:
                        0))
  -thr , --expthres     Intra-cluster expression threshold to be used in gene
                        filtering (default: None)
  -im , --imexp         Significance of out-of-cluster expression rate
                        (default: 10))
  -nsel , --nofsel      Number of genes remaining for each cluster after
                        filtering (default: 10))
  -nolnorm, --nolognorm
                        Log normalization status in gene filtering (default:
                        True)
  -nmark , --nofmarkers
                        Number of markers to be selected for each cluster
                        (default: 5))
  -ncore , --nofcores   Number of cores to use (default: -2))
  -dyn, --dynprog       Dynamic programming option for gene selection
                        (default: False)
  -dot, --dotplot       Dot plotting status (default: False)
  -tsne, --marktsne     T-SNE plotting status (default: False)
  -heat, --markheat     Heatmap plotting status (default: False)
  -knn, --knnconf       K-NN classiification and confusion matrix plotting
                        status (default: False)
  -t, --transpose       If the data matrix is gene*cell, use (default: False)
  -wrt, --writeres      If you want to print the results to the directory
                        where the data is, use (default: False)

Now, let’s perform the marker selection process on the Zeisel data set. First you can set the working directory to the folder where the dataset is located.

Data and labels must be in .csv format. If you want to give gene names, the room should be in .csv format.

[2]:

%%bash
cd Pollen
ls

Pollen_Data.csv
Pollen_Data_markers_res_ann.csv
Pollen_Data_markers_res_ind.csv
Pollen_Gene_Ann.csv
Pollen_Labels.csv

In the data set, rows should correspond to cells and columns should correspond to genes. That’s why we used the -t argument here

[3]:

%%bash
cd Pollen
scmags Pollen_Data.csv Pollen_Labels.csv -t -v

-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting  markers for each cluster
             Marker_1    Marker_2    Marker_3    Marker_4    Marker_5
C_Hi_2338   Gene_9590   Gene_9587  Gene_18197   Gene_6765   Gene_7636
C_Hi_2339   Gene_5452   Gene_3074   Gene_8163   Gene_4461   Gene_3121
C_Hi_BJ     Gene_3870   Gene_4560   Gene_7639  Gene_21444  Gene_11528
C_Hi_GW16  Gene_14484   Gene_3258  Gene_18602   Gene_6788   Gene_3325
C_Hi_GW21   Gene_7646  Gene_16255  Gene_11768  Gene_10727  Gene_14025
C_Hi_HL60  Gene_13612  Gene_16874  Gene_16683   Gene_3298  Gene_13747
C_Hi_K562   Gene_7897   Gene_7008   Gene_7898   Gene_7074  Gene_17624
C_Hi_Kera   Gene_6529   Gene_9572  Gene_17013    Gene_804    Gene_805
C_Hi_NPC    Gene_3855  Gene_11381   Gene_3878  Gene_15372  Gene_21221
C_Hi_iPS    Gene_4022   Gene_9819  Gene_10525  Gene_17332  Gene_19200

If you are not working in the location of the data, you can give the arguments as path.

[4]:

%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv  -t -v

-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting  markers for each cluster
             Marker_1    Marker_2    Marker_3    Marker_4    Marker_5
C_Hi_2338   Gene_9590   Gene_9587  Gene_18197   Gene_6765   Gene_7636
C_Hi_2339   Gene_5452   Gene_3074   Gene_8163   Gene_4461   Gene_3121
C_Hi_BJ     Gene_3870   Gene_4560   Gene_7639  Gene_21444  Gene_11528
C_Hi_GW16  Gene_14484   Gene_3258  Gene_18602   Gene_6788   Gene_3325
C_Hi_GW21   Gene_7646  Gene_16255  Gene_11768  Gene_10727  Gene_14025
C_Hi_HL60  Gene_13612  Gene_16874  Gene_16683   Gene_3298  Gene_13747
C_Hi_K562   Gene_7897   Gene_7008   Gene_7898   Gene_7074  Gene_17624
C_Hi_Kera   Gene_6529   Gene_9572  Gene_17013    Gene_804    Gene_805
C_Hi_NPC    Gene_3855  Gene_11381   Gene_3878  Gene_15372  Gene_21221
C_Hi_iPS    Gene_4022   Gene_9819  Gene_10525  Gene_17332  Gene_19200

If you want, you can save your results in .csv format with -wrt command. Results are saved as two files. One has selected gene names and the other has matrix indices.

[5]:

%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv  -t -wrt
ls Pollen

Pollen_Data.csv
Pollen_Data_markers_res_ann.csv
Pollen_Data_markers_res_ind.csv
Pollen_Gene_Ann.csv
Pollen_Labels.csv

You can also change the number of genes remaining after filtering and the number of markers to be selected.

Note

If you are going to increase the number of markers, make sure that the number of genes remaining after filtering is more than the number of markers to be selected.

[6]:

%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv  -t -v -nsel 20 -nmark 10

-> Eliminating low expression genes
-> Selecting cluster-specific candidate marker genes
-> Selecting  markers for each cluster
             Marker_1    Marker_2  ...    Marker_9   Marker_10
C_Hi_2338   Gene_9590  Gene_17102  ...   Gene_3616   Gene_9537
C_Hi_2339   Gene_5452  Gene_13740  ...   Gene_3105   Gene_8165
C_Hi_BJ     Gene_3870   Gene_4560  ...   Gene_6719   Gene_1584
C_Hi_GW16  Gene_14484   Gene_3258  ...  Gene_17020  Gene_22028
C_Hi_GW21   Gene_7646  Gene_16255  ...  Gene_18836   Gene_3482
C_Hi_HL60  Gene_13612  Gene_16874  ...   Gene_9083    Gene_466
C_Hi_K562   Gene_7897   Gene_7008  ...  Gene_12811   Gene_5127
C_Hi_Kera   Gene_6529   Gene_9536  ...    Gene_805    Gene_803
C_Hi_NPC   Gene_10156   Gene_3855  ...   Gene_9888  Gene_16873
C_Hi_iPS    Gene_4022   Gene_9819  ...    Gene_601  Gene_21753

[10 rows x 10 columns]

You can also visualize markers.

[7]:

%%bash
scmags Pollen/Pollen_Data.csv Pollen/Pollen_Labels.csv  -t -dot

Figure(600x1400)

Warning

The cli part of the package is under development. If the data differs from the requested format, errors may occur. A few arguments related to reading data have been added, but errors may still occur.