From gsutil to gcloud storage: Introducing Simplicity and Improved Performance

Joel Vasallo
TAG Tech Blog
Published in
3 min readJun 4, 2023

--

If you have been using Google Cloud and Google Cloud Storage (GCS), you are probably very familiar with the CLI tool: gsutil. Uploading files to GCS was easy! If you wanted to upload files to a GCS bucket, you would do:

gsutil cp file.txt gs://my-bucket-name

However, once you got to nested directories or large files, you would end up having to start using command line flags such as recursive (-r) for nested folders and multi-threading/processing (-m) for large file uploads:

gsutil -m cp -r my_folder gs://my-bucket-name

Command line flags aside, data transfer performance was lackluster; especially with large file uploads. There are plenty of articles talking about how to improve gsutil performance such as using Linux vs Windows and optimizing your compute resources, however today I want to share a slightly old yet new CLI: gcloud storage

What’s new with gcloud storage CLI?

With gcloud storage, Google Cloud aimed to improve usability by reducing complexity of complex command line flags while also solving some fundamental performances issues seen in gsutil.

With gcloud storage, it is now aligned under and alongside other gcloud commands; a welcomed change for consolidation/simplicity. In addition, gcloud storage removes the overhead of having to remember arbitrary CLI flags and now automatically detects the most optimal settings for your transfers! Finally, the command remains familiar and easy to use:

gcloud storage cp -r my_folder gs://my-bucket-name

From a technical standpoint, gcloud storage uses a faster hashing algorithm (CRC32C) for data integrity checking thats improves upon the old algorithm used by gsutil (crcmod). Also, it utilizes a new parallelization strategy that treats task management as a graph problem, which allows more work to be done in parallel with far less overhead. In short, faster and new way of parallelizing tasks which translates to better performance for us!

Migrating to gcloud storage

In an effort to increase adoption of gcloud storage, gsutil was updated to provide a lightweight shim to migrate to gcloud storage behind the scenes. This is helpful if you have a large runner and want to default all references of gsutil to use gcloud storage without updating all your scripts/code.

To use the shim globally at the system level, set use_gcloud_storage=True in the .boto config file under the [GSUtil] section:

[GSUtil]
use_gcloud_storage=True

You can also set the flag for individual commands using the top-level -o flag:

gsutil -o "GSUtil:use_gcloud_storage=True" -m cp -p file gs://bucket/obj

Real World Performance

In our testing, we are seeing nearly a 300% improvement in data transfer into our buckets by simply just moving over from gsutil cp to gcloud storage cp! Talk about an easy win!

Nearly a 3x jump in data transfer!

In addition, this is performance writing many small files vs large files. Our object write throughput shot up as well by 265%! We are very excited to test this further with larger datasets soon!

With improvements in usability, ease of migration, and core performance improvements, it makes using gcloud storage is a no brainer going forward. gsutil, we thank you for your service over the years!

Check out the full announcement and documentation below!

--

--

Joel Vasallo
TAG Tech Blog

Director, Platform Engineering @TAG — The Aspen Group. Google Developers Group Chicago (@chicagogdg) Organizer. I automate things sometimes and love Python.