Repair & Accessory Functions • flowdex

If not already done, set up the tutorial data as described in Get Started and Workflow 2.

Set the working directory to ‘tap_water_home’:

setwd(paste0(td, "/tap_water_home"))

Check and Repair fcs Files

As it also happened to the author, sometimes the FCM-machine seems to have written a kind of ‘faulty’ fcs file, resulting in the error message

The HEADER and the TEXT segment define different starting point ... to read the data

when trying to add a gate to the previously read in fcs files.
Reading in those fcs files via flowCore::read.FCS() was not the problem – at least in the authors case.

After some testing it was concluded that the reason for that error appears to lie in a multiplication of keywords in the afflicted fcs files. Thus, the function checkRepairFcsFiles() was written to remove all but one of each of the multiplied keywords.

checkRepairFcsFiles() is automatically executed whenever fcs files are read in.
However, the default is to not repair the afflicted fcs files, but to merely list the possibly erroneous files and to stop.

If so desired, checkRepairFcsFiles() can be called manually in order to have more options: Multiple keywords can be viewed, and it can be selected which one of the multiples of each keyword to keep.

For the purpose of this demonstration, some fcs files provoking the error as described above are included in the tutorial dataset. First, copy the faulty fcs files to a folder in the experiment-home directory:

td <- tempdir()
from <- list.files(paste0(td, "/flowdex_tutorial/fcsF_E_rep"), full.names=TRUE)
to <- paste0(td, "/tap_water_home/fcsF_E_rep")
dir.create(to)
file.copy(from, to, overwrite=TRUE)
#> [1] TRUE TRUE TRUE

Now check for faulty fcs files:

checkRepairFcsFiles(fn="fcsF_E_rep")

#> The following 2 files from the folder 'fcsF_E_rep' 
#> do have non-unique entries (all 2 fold) in their keywords:
#> N_na_GNeg_T6_th1_b2.fcs
#> N_na_GNeg_T6_th1_b3.fcs
#> Fehler: Consider setting 'fcsRepair' to TRUE.
#> CAVE: Original fcs files will then be overwritten.

We see that two files have multiplied keywords. Repair them by calling

checkRepairFcsFiles(fn="fcsF_E_rep", fcsRepair = TRUE, confirm = FALSE)
#> 
#> The following 2 files from the folder 'fcsF_E_rep' 
#> do have non-unique entries (all 2 fold) in their keywords:
#> N_na_GNeg_T6_th1_b2.fcs
#> N_na_GNeg_T6_th1_b3.fcs
#> 
#> All except the last of each multiplied keyword were removed, and the original fcs file were overwritten.
#> (You can use 'checkRepairFcsFiles' directly and set 'showMultiples' to TRUE to display the multiplied keywords.)
# check again, all should be good now:
checkRepairFcsFiles(fn="fcsF_E_rep")
#> All fcs files in the folder 'fcsF_E_rep' 
#> seem to be ok, i.e. do have only single entries in their keywords.

Please refer to checkRepairFcsFiles() for further information on how to view multiplied keywords and how to select which one of each multiple to keep.

Repair Sample ID and Volume Data

In case it happened that the volumetric measurement of a single sample did not succeed, or that an erroneous sample-ID string was provided in the sample-ID field of a single sample at the time of data acquisition, there are two functions to remedy these issues: repairVolumes() and repairSID().

Repair Volumes

First, copy some (manipulated) fcs files to a folder in the experiment-home directory:

from <- list.files(paste0(td, "/flowdex_tutorial/fcsF_E_vol_sid"), full.names=TRUE)
to <- paste0(td, "/tap_water_home/fcsF_E_vol_sid")
dir.create(to)
file.copy(from, to, overwrite=TRUE)
#> [1] TRUE TRUE TRUE TRUE

In order to repair missing volume data, call

# repairVolumes(fn = "fcsF_E_vol_sid", vol=1234567) # press enter to confirm, displays more information. 
# (Not suitable for the vignette.)
#
repairVolumes(fn = "fcsF_E_vol_sid", vol=1234567, confirm=FALSE)
#> Reading in all fcs files in the folder `fcsF_E_vol_sid`... ok.
#> Re-writing volume data of 2 FCS files, using `1234567` to replace missing values.
#> .. ok.
# and check again:
repairVolumes(fn = "fcsF_E_vol_sid", vol=1234567) # all should be good
#> Reading in all fcs files in the folder `fcsF_E_vol_sid`... ok.
#> All volume values are present - no re-writing of fcs files will be performed.

Please note that when leaving confirm at its default TRUE, more details regarding the fcs files to be repaired / re-written to disc are displayed.

If you want to force all fcs files in a folder to have the same volume data, you can set the argument includeAll to TRUE:

repairVolumes(fn = "fcsF_E_vol_sid", vol=1010101, confirm=FALSE, includeAll = TRUE)
#> Reading in all fcs files in the folder `fcsF_E_vol_sid`... ok.
#> Re-writing volume data of 4 FCS files, using `1010101` to replace present or missing values.
#> .... ok.

Repair Sample ID

Assuming you repaired the volume data as described above, we can use the fcs files in the folder ‘fcsF_E_vol_sid’ to demonstrate how to repair a faulty sample ID.
It is more than likely that sometimes (e.g. in the long nights of data acquisition because the FCM-machine is so occupied that you are driven to the long hours after dark…) an erroneous sample ID character is provided in the sample ID field of an individual sample at the time of data acquisition (see the structured ID string).
repairSID() can be used to repair these faulty sample IDs.

In the best case, a faulty sample ID comes to your attention because the translation of element names via the dictionary does not work.
In the worst case, the element value is wrong. This only comes to your attention when e.g. cross-referencing the sample names and the cyTags / the sample IDs in the pData slot of the ‘fdmat’ object (as produced e.g. by flowdexit() — if there are descriptive sample names…

To repair a faulty sample ID in a single fcs file, you first have to read in all (or some; use argument patt) fcs files in a folder; they come back as flowCore::flowSet().
Then this ‘flowSet’ is given again to the function repairSID(), but now a sample name and its new sample ID can be specified:

flowset <- repairSID(fn = "fcsF_E_vol_sid")
#
flowset@phenoData@data # very bad sample ID in the fourth sample
#>                          volume     btim
#> N_na_GNeg_T4_th1_b4.fcs 1010101 14:19:30
#> N_na_GNeg_T5_th3_b3.fcs 1010101 14:34:13
#> N_na_GPos_T5_th1_b1.fcs 1010101 14:43:57
#> N_na_GPos_T5_th1_b3.fcs 1010101 14:47:13
#>                                                                             sampleId
#> N_na_GNeg_T4_th1_b4.fcs tr: GNeg; Td: 4; wt: nativ; ap: no; th: th1; ha: ha1; bk: b4
#> N_na_GNeg_T5_th3_b3.fcs tr: GNeg; Td: 5; wt: nativ; ap: no; th: th3; ha: ha2; bk: b3
#> N_na_GPos_T5_th1_b1.fcs tr: GPos; Td: 5; wt: nativ; ap: no; th: th1; ha: ha1; bk: b1
#> N_na_GPos_T5_th1_b3.fcs                                 blablabla very bad sample ID
#>                                            name
#> N_na_GNeg_T4_th1_b4.fcs N_na_GNeg_T4_th1_b4.fcs
#> N_na_GNeg_T5_th3_b3.fcs N_na_GNeg_T5_th3_b3.fcs
#> N_na_GPos_T5_th1_b1.fcs N_na_GPos_T5_th1_b1.fcs
#> N_na_GPos_T5_th1_b3.fcs N_na_GPos_T5_th1_b3.fcs

#
# view the correct sample IDs of the other samples
# copy one of those correct sample IDs
# paste and modify it - it should be beaker #3:
nsid <- "tr: GPos; Td: 5; wt: nativ; ap: no; th: th1; ha: ha1; bk: b3"

#
# also copy and paste the sample name
sana <- "N_na_GPos_T5_th1_b3.fcs" # the  name of the sample having the faulty sample ID

#
# now put all together and write fcs file with correct sample ID back to disk
repairSID(fs=flowset, fn="fcsF_E_vol_sid", name=sana, newSID = nsid, confirm = FALSE)
#> `N_na_GPos_T5_th1_b3.fcs` has been rewritten with the modified sample ID.

#
# and check again:
flowset <- repairSID(fn = "fcsF_E_vol_sid")
flowset@phenoData@data # all is good 
#>                          volume     btim
#> N_na_GNeg_T4_th1_b4.fcs 1010101 14:19:30
#> N_na_GNeg_T5_th3_b3.fcs 1010101 14:34:13
#> N_na_GPos_T5_th1_b1.fcs 1010101 14:43:57
#> N_na_GPos_T5_th1_b3.fcs 1010101 14:47:13
#>                                                                             sampleId
#> N_na_GNeg_T4_th1_b4.fcs tr: GNeg; Td: 4; wt: nativ; ap: no; th: th1; ha: ha1; bk: b4
#> N_na_GNeg_T5_th3_b3.fcs tr: GNeg; Td: 5; wt: nativ; ap: no; th: th3; ha: ha2; bk: b3
#> N_na_GPos_T5_th1_b1.fcs tr: GPos; Td: 5; wt: nativ; ap: no; th: th1; ha: ha1; bk: b1
#> N_na_GPos_T5_th1_b3.fcs tr: GPos; Td: 5; wt: nativ; ap: no; th: th1; ha: ha1; bk: b3
#>                                            name
#> N_na_GNeg_T4_th1_b4.fcs N_na_GNeg_T4_th1_b4.fcs
#> N_na_GNeg_T5_th3_b3.fcs N_na_GNeg_T5_th3_b3.fcs
#> N_na_GPos_T5_th1_b1.fcs N_na_GPos_T5_th1_b1.fcs
#> N_na_GPos_T5_th1_b3.fcs N_na_GPos_T5_th1_b3.fcs

Apply Bandpass Filter

Function applyBandpass() does exactly what it says – it applies a bandpass-like filter to the fluorescence intensities stemming from a single gate.
If not already done before, create an ‘fdmat’ object containing only a small subset of the data:

fdmat_s <- flowdexit(patt = "T4_th1")
#> Reading in fcs files... ok. 
#> Producing gating set... Applying fjbiexp transformation... ok. 
#> Gating: (1 gate)
#> done!
#> DNA+: Extracting binned data on FITC.A (res=220) and recalc. to volume... ok. 
#> Exporting data (1 gate) to xlsx...ok. 
#> fdmat-object saved.
plotFlscDist(fdmat_s, toPdf = FALSE)

We see not much signal below the fluorescence intensity 1600 and above 2400.
Lets apply a bandpass filter to our ‘fdmat’ object so that only those fluorescence intensities between 1600 and 2400 remain:

fdmat_s_bp <- applyBandpass(fdmat_s, bandpass = c(1600, 2400))
fdmat_s[[1]] # compare
#> An object of class 'fdmat_single'
#> containing data from 12 samples in 219 fluorescence intensities from flsc1256 to flsc3994
#> derived from gate 'DNA+'.
#> original 
#>                               flsc1256 flsc1269 flsc1281 flsc3969 flsc3981
#> N_na_GNeg_T4_th1_b1.fcs|DNA+ 0.0000000        0 0.000000        0        0
#> N_na_GNeg_T4_th1_b2.fcs|DNA+ 0.0000000        0 0.000000        0        0
#> N_na_GPos_T4_th1_b5.fcs|DNA+ 0.9221548        0 1.998002        0        0
#> N_na_GPos_T4_th1_b6.fcs|DNA+ 0.0000000        0 0.000000        0        0
#>                              flsc3994
#> N_na_GNeg_T4_th1_b1.fcs|DNA+        0
#> N_na_GNeg_T4_th1_b2.fcs|DNA+        0
#> N_na_GPos_T4_th1_b5.fcs|DNA+        0
#> N_na_GPos_T4_th1_b6.fcs|DNA+        0
#> (showing only the first and last columns and rows)
#> 
#> Overall data for events per volume unit:
#>                         events_ml mean is_filtered events_ml_orig
#> N_na_GNeg_T4_th1_b1.fcs    297516 1985       FALSE         297516
#> N_na_GNeg_T4_th1_b2.fcs    379780 1913       FALSE         379780
#> N_na_GNeg_T4_th1_b3.fcs    276220 1913       FALSE         276220
#> N_na_GNeg_T4_th1_b4.fcs    172242 1925       FALSE         172242
#> N_na_GNeg_T4_th1_b5.fcs    480066 1954       FALSE         480066
#> N_na_GNeg_T4_th1_b6.fcs    244154 1923       FALSE         244154
#> N_na_GPos_T4_th1_b1.fcs    278657 1893       FALSE         278657
#> N_na_GPos_T4_th1_b2.fcs    373333 1923       FALSE         373333
#> N_na_GPos_T4_th1_b3.fcs    148022 1954       FALSE         148022
#> N_na_GPos_T4_th1_b4.fcs    485582 1889       FALSE         485582
#> N_na_GPos_T4_th1_b5.fcs    424527 1895       FALSE         424527
#> N_na_GPos_T4_th1_b6.fcs    195516 1993       FALSE         195516
fdmat_s_bp[[1]] # 
#> An object of class 'fdmat_single'
#> containing data from 12 samples in 64 fluorescence intensities from flsc1608 to flsc2399
#> derived from gate 'DNA+'.
#> bandpass applied 
#>                               flsc1608 flsc1620  flsc1633  flsc2374  flsc2386
#> N_na_GNeg_T4_th1_b1.fcs|DNA+  0.000000  0.00000 72.696534 517.43129 446.22045
#> N_na_GNeg_T4_th1_b2.fcs|DNA+ 12.346628  0.00000  0.000000 174.38971 138.47691
#> N_na_GPos_T4_th1_b5.fcs|DNA+  0.000000  0.00000  0.000000  85.76039  91.54948
#> N_na_GPos_T4_th1_b6.fcs|DNA+  3.073849 15.01063  2.612772 314.96708 238.68439
#>                               flsc2399
#> N_na_GNeg_T4_th1_b1.fcs|DNA+ 385.97300
#> N_na_GNeg_T4_th1_b2.fcs|DNA+ 125.97659
#> N_na_GPos_T4_th1_b5.fcs|DNA+  84.99193
#> N_na_GPos_T4_th1_b6.fcs|DNA+ 228.18207
#> (showing only the first and last columns and rows)
#> 
#> Overall data for events per volume unit:
#>                         events_ml mean is_filtered events_ml_orig
#> N_na_GNeg_T4_th1_b1.fcs  295317.0 1985       FALSE         297516
#> N_na_GNeg_T4_th1_b2.fcs  379464.5 1913       FALSE         379780
#> N_na_GNeg_T4_th1_b3.fcs  275184.0 1913       FALSE         276220
#> N_na_GNeg_T4_th1_b4.fcs  171926.4 1925       FALSE         172242
#> N_na_GNeg_T4_th1_b5.fcs  479831.1 1954       FALSE         480066
#> N_na_GNeg_T4_th1_b6.fcs  243834.9 1923       FALSE         244154
#> N_na_GPos_T4_th1_b1.fcs  278044.5 1893       FALSE         278657
#> N_na_GPos_T4_th1_b2.fcs  372731.4 1923       FALSE         373333
#> N_na_GPos_T4_th1_b3.fcs  147105.3 1954       FALSE         148022
#> N_na_GPos_T4_th1_b4.fcs  484372.7 1889       FALSE         485582
#> N_na_GPos_T4_th1_b5.fcs  423963.1 1895       FALSE         424527
#> N_na_GPos_T4_th1_b6.fcs  193982.0 1993       FALSE         195516
ncol(fdmat_s[[1]])
#> [1] 219
ncol(fdmat_s_bp[[1]])
#> [1] 64

Also the number of overall events per volume unit are updated - observe and compare the number in the legend in the next plot and the one from before.
Visualize the difference using plotFlscDist():

plotFlscDist(fdmat_s_bp, toPdf = FALSE)

Finally, the rawdata with applied bandpass filter can be exported via

exportFdmatData(fdmat_s_bp, expo.name = "flscData_d4_th1")
#> Exporting data (1 gate) to xlsx...ok.

Enjoy!