Improved instructions for downloading preview datasets on Windows
Samantha Scovanner
Here is detailed feedback from one of our users, journaling his experience downloading the data:
The instructions for Windows refer to Windows WSL. I know what that is and I have it installed on my machine. But that would not be the case for the typical downloader. I suggest that you expand on that information, and probably include a link to a location that explains how to set up WSL and the Windows Ubuntu application, and also how to set up a Python environment on it.
The WSL Ubuntu on my Windows machine has Python 3.5.2 installed, but it does not have a working "pip". If I issue command "pip" I get an error message that the command is not available, and that I can install it using command "sudo apt install python-pip". But if I issue that command it tells me "unable to locate package python-pip". So, even if I have a working WSL with Python, I cannot use pip, and this will prevent the download scripts from working. You would need to add a link to working directions to set up pip under WSL. I can make no additional progress on my testing until this gets resolved. It's not good that we have dependencies on WSL, Windows Ubuntu, Python, and pip.
After a bit of searching, running "sudo apt update" followed by "sudo apt install python-pip" resulted in a working pip environment (however both commands took a long time to execute).
The second problem I ran into is that you are supposed to use the chmod command on the file you downloaded, but the directions do not tell what the file name is. However, I was able to find the name of the script by looking at the Downloads directory.
The next hurdle is that the script gets downloaded to c:/users/user/Downloads, but this directory is not visible in the Ubuntu application (WSL). There must be a way to do it and I will look for it, but the directions should be updated with this information (my home directory in the ubuntu application does not have a Downloads directory)
To avoid the previous issue I attempted to run the browser from the Ubuntu application. To install firefox, I did "sudo apt install firefox". This also took a long time, but the resulting firefox executable did not work. So I will have to find how to access the script from the Ubuntu application. I will continue this tomorrow.
So it turns out that the download directory, in c:/Users/UserName/Downloads, is visible under the Windows Ubuntu app at /mnt/c/Users/UserName/Downloads
So I moved it to /home/paolo, did the chmod command on it, and ran it.
It ran without problems, and took about 20 minutes to download the smallest dataset (14 GB), which is a reasonable time.
The files are compressed, so they need to uncompressed before they can be used by applications that take the raw files as input (you maky want to make a note on that in the documentation).
One problem I have is that there is no documentation of the meaning of the various files. From the file naming I can guess that the _S_ portion of the name refers to different samples, but this is purely a guess on my part.
Also, it seems that the R1 and R2 portions of the file name refer to the two mated reads, while the I files contain tags. However again, this is a guess on my part. It would be good if there was a README file containing this information.
There are some json files that might contain that information, but those are not human readable (by my definition of human readable), and it is not obvious how to extract that information.
It would help if you add a section on downloading on Windows where you cover some of the above adventures. But I still believe the process is unreasonable and too invasive, because it forced me to install a Python environment with the huge number of dependencies it requires. It's like if you invited me to your home for tea, and I painted all the walls of your home a color of my choice.
So I continue to think that we need a real download button. Otherwise we will create the perception that the HCA is hard to use, and that the CZI people were out of touch with the real needs of the scientific community. As Andrey pointed out, the download button could still use the cli under the cover (on the server side).