How to Download Images From URLs, Convert the Type, and Save Them in the Cloud With Python

Image by Darkmoon_Art on Pixabay

It can be unstable to serve third-party images because there can be latencies and they can be changed or removed without prior notice. Therefore, for the robustness of your service, sometimes you need to download the images and serve a stable version of them.

In this post, we will introduce how to download images from URLs, convert or normalize the type and save them in the Cloud so they can be used in your application in a more robust way.


Download images from URLs

It seems straightforward to download an image from a URL as we just need to get the content and save it to a file. However, many modern image providers don’t have the plain image name in the URL. In this case, we need to assign it a name and also specify the extension. The extensions of most common image types can be guessed accurately with the guess_extension function of the mimetypes library. However, some newer image types such as webp cannot be guessed. In this case, we can convert the image to a standard type without knowing the original image type, which will be covered in the next section of this post.

The following code snippet demonstrates how to download the content of an image from a URL, guess its extension, and save it locally.

import requests
import mimetypes

# Free image from Unsplash: https://unsplash.com/photos/jz4ca36oJ_M
# The URL has no image name and extension.
url = "https://images.unsplash.com/photo-1573804633927-bfcbcd909acd?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2127&q=80"
response = requests.get(url)

# Get the extension of the image from content type.
content_type = response.headers["Content-Type"]
# Can guess common image types, such as jpeg, png, tiff, etc.
img_ext = mimetypes.guess_extension(content_type)

# Construct the image name.
file_name = "google" + img_ext

# Write the content, which is binary, to the file which is also opened
# in binary mode.
with open(file_name, "wb") as f_imag:
    f_imag.write(response.content)

Convert the type of an image

As mentioned above, sometimes the image type cannot be guessed by the mimetypes library. In this case, we can use the popular and powerful image processing library Pillow to convert the type and save it in a standard format.

It’s recommended to install Pillow in a virtual environment to not let it impact your system’s libraries. Note that we need to use the BytesIO class to convert the request content into a file-like object so it can be read by Pillow.

We don’t need to convert the type of the image explicitly, just specify a format when it’s saved and the type will be converted automatically:

import requests
from io import BytesIO
from PIL import Image

# Free image from Unsplash: https://unsplash.com/photos/jz4ca36oJ_M
# The URL has no image name and extension.
url = "https://images.unsplash.com/photo-1573804633927-bfcbcd909acd?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2127&q=80"
response = requests.get(url)

# First use BytesIO to create a file-like object from the request content.
image = Image.open(BytesIO(response.content))
# The image type is converted when it's saved.
image.save("google.png", format="PNG")

Save images in the Cloud

Now let’s save the downloaded images in the Cloud so they can be used by our applications. In this post, we will save the images in Google Cloud Storage but the logic and procedures should be applicable to other Cloud providers.

You can authenticate your storage client with a service account JSON file. Alternatively, you can authenticate all your Google client libraries using a Google user account. We will adopt the second one here because it’s simpler.

First, install the google-cloud-storage library in your virtual environment. Then authenticate your Google libraries. Basically, you only need to run these commands:

$ gcloud auth login
$ gcloud auth application-default login

When everything is set up, we can then upload the images to GCP Storage. You need to create a bucket first, which should have a globally unique name. You can do it in the GCP console, or use gsutil or the storage library in Python.

In GCP Storage, a file object is called a blob which is created by the bucket object and contains the metadata for the file. The blob is used to upload a file to GCP Storage.

Please check the code snippet below for how to upload files to GCP storage. It is rather self-explanatory. However, note that we can specify paths in the file name and the folders/subfolders will be created automatically in the bucket.

import requests
from io import BytesIO
from google.cloud import storage
from PIL import Image

# Free image from Unsplash: https://unsplash.com/photos/jz4ca36oJ_M
# The URL has no image name and extension.
url = "https://images.unsplash.com/photo-1573804633927-bfcbcd909acd?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2127&q=80"
response = requests.get(url)

# First use BytesIO to create a file-like object from the request content.
image = Image.open(BytesIO(response.content))
# The image type is converted when it's saved.
image.save("google.png", format="PNG")

# The client is authenticated with `gcloud auth application-default login`.
# It can also be authenticated with a service account.
storage_client = storage.Client()
# The bucket needs to be created beforehand.
bucket = storage_client.bucket("superdataminer")

# A blob is a file object in GCP Storage, which contains metadata for the file.
blob = bucket.blob("google.png")
# We can specify path in the image name. The folders/subfolders will be created automatically in the bucket.
blob_with_path = bucket.blob("images/google.png")

# Upload from filename, which can be a relative or absolute file path.
blob.upload_from_filename("google.png")
blob_with_path.upload_from_filename("google.png")

This code can be further improved. We don’t need to save the images locally and then upload them to GCP storage, which is inefficient. The improvement will be introduced in the next section and you will then have a final version of the code which can be used in your work directly.

After the above code is run, you can check the folders and images in GCP Storage:


Save an image in the Cloud directly— The final version

In the previous example, the images were first saved locally and then uploaded to GCP storage. This, however, is inefficient because it involves extra IO operations. Besides, we need to save them somewhere locally properly and clean them up later. Luckily, we can upload images to the Cloud directly without saving them locally, which avoids all these nuisances.

This is achieved by the magical BytesIO class again. We can create a file-like object with BytesIO and then save the image to it with Pillow. The file-like object contains the content of the converted image and can be uploaded to GCP Storage with the upload_from_string method of a blob, which accepts binary strings as the input:

import requests
from io import BytesIO
from google.cloud import storage
from PIL import Image

# Free image from Unsplash: https://unsplash.com/photos/jz4ca36oJ_M
# The URL has no image name and extension.
url = "https://images.unsplash.com/photo-1573804633927-bfcbcd909acd?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2127&q=80"
response = requests.get(url)

# First use BytesIO to create a file-like object from the request content.
image = Image.open(BytesIO(response.content))

# Here BytesIO is used to create a new file-like object to save the content of
# the image with type converted by Pillow.
with BytesIO() as f_png:
    image.save(f_png, format="PNG")
    # The binary conent can be uploaded to GCP Storage directly.
    content = f_png.getvalue()

storage_client = storage.Client()
bucket = storage_client.bucket("superdataminer")

blob_with_path = bucket.blob("images/google-direct.png")
# upload_from_string use binary string as the input.
# We need to specify the content type otherwise it will be text by default.
blob_with_path.upload_from_string(content, content_type="image/png")

When you check the GCP Storage, you will find the image can be uploaded successfully in this way as well.


In this post, we introduced how to download images from URLs, convert the type and save them in the Cloud. The code snippets are easy to follow and cover all common practical usages. If you struggling with unstable images in your application, this post can help you out.


Related articles:



Leave a comment

Blog at WordPress.com.