In this tutorial, we outline a step by step instruction on how to use TrackSOM to cluster and track time-series data. For simplicity, we will use the synthetic dataset outlined in our manuscript uploaded to bioArxiv. The dataset files are provided in the inst
directory: link.
If you have not installed the TrackSOM package, please use devtools
to install the package from the following TrackSOM github repo. For devtools, the repo parameter will be: ghar1821/TrackSOM
.
The following code shall import the TrackSOM package:
library(TrackSOM)
TrackSOM supports dataset stored as either CSV or FCS files or as data.table
object (enhanced version of R’s native data.frame. See data.table vignette for more details). The following sections shall show you how to read in those files and pass them to the TrackSOM function.
Here, we assume that each file contains data belonging to one time-point. Hence you should have more than 1 CSV files. If this is not the case, please reformat your data files.
To import the dataset stored as CSV files, TrackSOM needs to know where the files are stored. These files’ location must be stored within a vector which get passed on to the TrackSOM function.
Important: the vector must be organised such as the first element is the data for the very first time-point, the 2nd element for the 2nd time-point, and so on.
First, we start with specifying the CSV files are. In this example, the dataset files are already stored within the package, so all you need to do is load it up:
data.files.fullpath <- c(
system.file("extdata", "synthetic_d0.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d1.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d2.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d3.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d4.csv", package = "TrackSOM")
)
For your dataset, please replace the content of the vector with the absolute path for the dataset files!
Let’s inspect the content of data.files.fullpath
:
print(data.files.fullpath)
## [1] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d0.csv"
## [2] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d1.csv"
## [3] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d2.csv"
## [4] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d3.csv"
## [5] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d4.csv"
You can see it contains a list of absolute path of multiple CSV files, each belonging to a time-point.
The procedure is very similar to reading CSV files above, except we’re going to be storing FCS files’ path rather than CSV.
Again, here, we assume that each file contains data belonging to one time-point. Hence you should have more than 1 FCS files. If this is not the case, please reformat your data files.
To import the dataset stored as FCS files, TrackSOM needs to know where the files are stored. These files’ location must be stored within a vector which get passed on to the TrackSOM function. Important: the vector must be organised such as the first element is the data for the very first time-point, the 2nd element for the 2nd time-point, and so on.
First, we start with specifying the FCS files are. In this example, the dataset files are already stored within the package, so all you need to do is load it up:
data.files.fullpath.fcs <- c(
system.file("extdata", "synthetic_d0.fcs", package = "TrackSOM"),
system.file("extdata", "synthetic_d1.fcs", package = "TrackSOM"),
system.file("extdata", "synthetic_d2.fcs", package = "TrackSOM"),
system.file("extdata", "synthetic_d3.fcs", package = "TrackSOM"),
system.file("extdata", "synthetic_d4.fcs", package = "TrackSOM")
)
For your dataset, please replace the content of the vector with the absolute path for the dataset files!
Let’s inspect the content of data.files.fullpath.fcs
:
print(data.files.fullpath.fcs)
## [1] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d0.fcs"
## [2] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d1.fcs"
## [3] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d2.fcs"
## [4] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d3.fcs"
## [5] "/private/var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T/RtmpJya44y/temp_libpath36ab332710b9/TrackSOM/extdata/synthetic_d4.fcs"
You can see it contains a list of absolute path of multiple FCS files, each belonging to a time-point.
data.table
objectSometimes, it is convenient to have the dataset stored as the data.table
object files, e.g. when you need to run some code to preprocess your data using R! As an example, supposed the synthetic dataset CSV files were already read in as data.table
object prior to running TrackSOM (say you did some preliminary clean up or filtering). What you need to do is organise them in a list such that each element is a data.table
object for the dataset in a time-point. Important: the list must be organised such as the first element is the data for the very first time-point, the 2nd element for the 2nd time-point, and so on.
library(data.table)
data.files <- c(
system.file("extdata", "synthetic_d0.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d1.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d2.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d3.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d4.csv", package = "TrackSOM")
)
dat <- lapply(data.files, function(f) fread(f))
Let’s do a quick preview of the data:
dat
## [[1]]
## x y z timepoint
## 1: 9.598217 11.728198 11.382064 Mock
## 2: 8.937549 8.653750 11.107314 Mock
## 3: 9.632932 9.657278 9.115804 Mock
## 4: 9.770393 12.478038 11.451082 Mock
## 5: 30.426699 31.290929 28.901204 Mock
## ---
## 7096: 10.533174 9.462175 9.867354 Mock
## 7097: 10.544833 10.426697 9.571881 Mock
## 7098: 10.140938 8.727880 10.247086 Mock
## 7099: 30.967131 30.261241 30.469085 Mock
## 7100: 10.063984 10.289172 10.382876 Mock
##
## [[2]]
## x y z timepoint
## 1: 33.850857 30.653891 29.257884 SYN-1
## 2: 10.429818 13.880273 10.878985 SYN-1
## 3: 10.009247 10.327127 10.994237 SYN-1
## 4: 9.751578 8.555277 8.914872 SYN-1
## 5: 10.456213 10.972776 10.011441 SYN-1
## ---
## 7096: 9.237145 10.203747 12.475563 SYN-1
## 7097: 12.957000 9.147637 8.807634 SYN-1
## 7098: 12.297580 11.034141 9.828642 SYN-1
## 7099: 9.393480 11.766797 11.137156 SYN-1
## 7100: 11.684874 8.864774 11.771160 SYN-1
##
## [[3]]
## x y z timepoint
## 1: 10.107081 10.371958 8.862552 SYN-2
## 2: 12.271113 28.710774 13.945666 SYN-2
## 3: 10.248482 10.249500 7.918076 SYN-2
## 4: 9.341533 10.228882 12.209994 SYN-2
## 5: 10.116334 9.805188 9.742291 SYN-2
## ---
## 7196: 10.682697 13.643521 10.637965 SYN-2
## 7197: 27.598520 30.275747 29.691414 SYN-2
## 7198: 10.728449 10.020652 9.517445 SYN-2
## 7199: 10.167862 9.754739 9.416342 SYN-2
## 7200: 9.554067 11.296156 12.535182 SYN-2
##
## [[4]]
## x y z timepoint
## 1: 13.864150 8.884197 8.742413 SYN-3
## 2: 25.420014 40.961726 26.453685 SYN-3
## 3: 23.968729 31.313815 31.314857 SYN-3
## 4: 10.538730 9.855319 9.061674 SYN-3
## 5: 8.814686 12.068781 10.729973 SYN-3
## ---
## 7196: 10.358157 8.856995 10.440222 SYN-3
## 7197: 10.093926 11.400187 10.429873 SYN-3
## 7198: 14.720125 10.599644 9.599185 SYN-3
## 7199: 24.545254 29.734070 29.035897 SYN-3
## 7200: 9.967504 14.351193 11.619231 SYN-3
##
## [[5]]
## x y z timepoint
## 1: 10.088854 10.171541 10.122280 SYN-4
## 2: 14.776449 10.189092 10.621575 SYN-4
## 3: 9.652116 12.567544 9.282672 SYN-4
## 4: 10.436809 10.725664 10.016877 SYN-4
## 5: 10.235657 10.966951 9.371363 SYN-4
## ---
## 7096: 21.234807 30.522073 30.590687 SYN-4
## 7097: 9.703690 8.319128 9.579323 SYN-4
## 7098: 17.654003 30.891041 29.390515 SYN-4
## 7099: 9.089960 9.744563 12.483508 SYN-4
## 7100: 36.910728 29.551236 29.926378 SYN-4
As you can see, there are 5 elements in the list, each containing a dataset belonging to a time-point.
Depending on how your dataset is stored, the parameter inputFiles
is either a vector of absolute path of your CSV or FCS files or a list of data.table
object. Additionally, you need to specify the type as the parameter dataFileType
. It can be either .csv
, .fcs
, or data.frame
depending on how your dataset is stored.
Note: examples here assume datasets are stored as CSV files.
The TrackSOM function have various parameters. Of them, only inputFiles
and colsToUse
have no default values. The inputFiles
parameter has been explained in the previous section.
The colsToUse
parameter specify the columns in your dataset to be used for clustering and tracking. This must be a vector. For the synthetic dataset, the columns are denoted as x
, y
, and z
. For cytometry data, this should be a vector of markers.
The TrackSOM functions have the following parameters which are pre-filled with default values:
The function also accept parameters that are built into FlowSOM ReadInput
, BuildSOM
and BuildMST
functions. See FlowSOM’s vignette for specific parameter information.
Let’s run TrackSOM with the following settings:
noMerge = TRUE
)tracking = TRUE
)The remaining parameters will be set to the default values.
tracksom.result <- TrackSOM(inputFiles = data.files.fullpath,
colsToUse = c('x', 'y', 'z'),
tracking = TRUE,
noMerge = TRUE,
nClus = c(3,3,9,7,15),
dataFileType = ".csv"
)
## Building SOM
## Mapping data to SOM
## Building MST
## Extracting SOM nodes for each time point
## Running meta clustering
## Meta clustering time point 1 with 82 SOM nodes
## Meta clustering time point 2 with 93 SOM nodes
## Meta clustering time point 3 with 90 SOM nodes
## Meta clustering time point 4 with 95 SOM nodes
## Meta clustering time point 5 with 95 SOM nodes
TrackSOM result is stored in an object. To facilitate the extraction of meta-clusters ID and SOM nodes for each cell, we provide the following functions:
ConcatenateClusteringDetails
: assuming your dataset is stored as a data.table
object, the function attaches the meta-cluster ID and SOM nodes as separate columns.ExportClusteringDetailsOnly
: this function simply extract the meta-cluster ID and SOM nodes for each cell as a data.table
object. The cells are ordered based on the ordering of your dataset.To use ConcatenateClusteringDetails
, you need to first read in all your datasets as one giant data.table
.
library(data.table)
data.files <- c(
system.file("extdata", "synthetic_d0.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d1.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d2.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d3.csv", package = "TrackSOM"),
system.file("extdata", "synthetic_d4.csv", package = "TrackSOM")
)
dat <- lapply(data.files, function(f) fread(f))
dat <- rbindlist(dat)
Double check the data is read in properly:
head(dat)
## x y z timepoint
## 1: 9.598217 11.728198 11.382064 Mock
## 2: 8.937549 8.653750 11.107314 Mock
## 3: 9.632932 9.657278 9.115804 Mock
## 4: 9.770393 12.478038 11.451082 Mock
## 5: 30.426699 31.290929 28.901204 Mock
## 6: 30.789497 30.069909 30.767145 Mock
tail(dat)
## x y z timepoint
## 1: 7.954781 9.241713 9.429919 SYN-4
## 2: 21.234807 30.522073 30.590687 SYN-4
## 3: 9.703690 8.319128 9.579323 SYN-4
## 4: 17.654003 30.891041 29.390515 SYN-4
## 5: 9.089960 9.744563 12.483508 SYN-4
## 6: 36.910728 29.551236 29.926378 SYN-4
To run ConcatenateClusteringDetails
, you need to pass the following parameters:
timepoint.col
: which column in your data define the time-point.timepoints
: what are the time-points (in order). This must be a vector.Now let’s attach the meta-cluster ID and the SOM nodes:
dat.clust <- ConcatenateClusteringDetails(
tracksom.result = tracksom.result,
dat = dat,
timepoint.col = "timepoint",
timepoints = c('Mock', 'SYN-1', 'SYN-2', 'SYN-3', 'SYN-4')
)
Inspect the content:
head(dat.clust)
## x y z timepoint TrackSOM_cluster
## 1: 9.598217 11.728198 11.382064 Mock 62
## 2: 8.937549 8.653750 11.107314 Mock 53
## 3: 9.632932 9.657278 9.115804 Mock 86
## 4: 9.770393 12.478038 11.451082 Mock 61
## 5: 30.426699 31.290929 28.901204 Mock 16
## 6: 30.789497 30.069909 30.767145 Mock 6
## TrackSOM_metacluster TrackSOM_metacluster_lineage_tracking
## 1: 2 B
## 2: 2 B
## 3: 2 B
## 4: 2 B
## 5: 1 A
## 6: 1 A
The function attaches extra 3 columns:
TrackSOM_cluster
: The SOM node of each cell.TrackSOM_metacluster
: The meta-cluster assignment produced by FlowSOM’s meta-clustering.TrackSOM_metacluster_lineage_tracking
: The tracking of meta-clusters’ evolution produced by TrackSOM. This gives you the changes undergone by the meta-clusters over time.The function gives back a data.table
object which you can export as CSV file using data.table
’s fwrite
function.
TrackSOM offers 2 visualisation mediums:
To draw network plots, we need to call the DrawNetworkPlot
function. Using the clustered and tracked data from previous sections, the following is an example on how to use the function:
DrawNetworkPlot(dat = dat.clust,
timepoint.col = "timepoint",
timepoints = c('Mock', 'SYN-1', 'SYN-2', 'SYN-3', 'SYN-4'),
cluster.col = 'TrackSOM_metacluster_lineage_tracking',
marker.cols = c('x', 'y', 'z'))
## Calculating edges
## Computing node details
## Calculating marker's average per node
## Saving node and edge details
## Start drawing plots
## Warning: Existing variables `x`, `y` overwritten by layout variables
## Drawing plots coloured by time point
## Drawing plots coloured by origin
## Drawing plots coloured by x
## Drawing plots coloured by y
## Drawing plots coloured by z
The marker.cols
will determine the markers which mean/median expression will be drawn on the network plots. There is no need to specify all the markers in the dataset, just the ones that you want the network plots to be coloured on.
The function won’t preview any plots, but it will instead store the plots as image files and some extra information (median/mean expression of markers) as CSV files:
list.files()
## [1] "network_colBy_origin.pdf" "network_colBy_timepoints.pdf"
## [3] "network_colBy_x.pdf" "network_colBy_y.pdf"
## [5] "network_colBy_z.pdf" "network_plot_edge_details.csv"
## [7] "network_plot_node_details.csv" "TrackSOM-workflow.R"
## [9] "TrackSOM-workflow.Rmd"
In this example, the plots are saved as PDF files. This can be changed, e.g. to save the plots as PNG files, by specifying the desired file format as the file.format
parameter.
To draw a timeseries heatmap, we need to call the DrawTimeseriesHeatmap
function. Using the clustered and tracked data from previous sections, the following is an example on how to use the function:
DrawTimeseriesHeatmap(dat = dat.clust,
timepoint.col = "timepoint",
timepoints = c('Mock', 'SYN-1', 'SYN-2', 'SYN-3', 'SYN-4'),
cluster.col = 'TrackSOM_metacluster_lineage_tracking',
marker.cols = c('x', 'y', 'z'))
## Computing node details
## Computing edge details
## Saving node and edge details
## Drawing timeseries heatmap coloured by x
## Drawing timeseries heatmap coloured by y
## Drawing timeseries heatmap coloured by z
The marker.cols
will determine the markers which mean/median expression will be drawn on the network plots. There is no need to specify all the markers in the dataset, just the ones that you want the network plots to be coloured on.
The function won’t preview any plots, but it will instead store the plots as image files and some extra information (median/mean expression of markers) as CSV files:
list.files()
## [1] "network_colBy_origin.pdf"
## [2] "network_colBy_timepoints.pdf"
## [3] "network_colBy_x.pdf"
## [4] "network_colBy_y.pdf"
## [5] "network_colBy_z.pdf"
## [6] "network_plot_edge_details.csv"
## [7] "network_plot_node_details.csv"
## [8] "Timeseries_heatmap_by_x.pdf"
## [9] "Timeseries_heatmap_by_y.pdf"
## [10] "Timeseries_heatmap_by_z.pdf"
## [11] "timeseries_heatmap_edges_details.csv"
## [12] "timeseries_heatmap_node_details.csv"
## [13] "TrackSOM-workflow.R"
## [14] "TrackSOM-workflow.Rmd"
In this example, the plots are saved as PDF files. This can be changed, e.g. to save the plots as PNG files, by specifying the desired file format as the file.format
parameter.
That is pretty much it folks! We’re actively updating TrackSOM and welcome feedbacks!
Thank you for your interest!