Flevo CFD

Upload files to AWS S3 using Presigned URLs in Python

Object storage is the ideal solution for handling large amount of unstructured data that don’t change that often (are static). it’s relatively fast, cheap and easy to scale.

In object storage systems, files are called objects. a unique identifer is assigned to each object which is the object name. there’s no hirearchy like a file storage in your operating system has. you just need to know the object name to retrieve it.

It scales easily. if you use a cloud-provider, you don’t need to worry about space! just upload files and you’re good to go! unlike file storage that you need to add more space and configure it, if you ran out of space.

It’s not the fastest storage solution but it’s not the slowest either. you get a good performance. I think at the moment, block storage is the fastest solution and also the most expensive!

AWS S3 is the de facto standard in the object storage systems. its API became the standard that other services are mimcing it. these services are called s3-compliant. it means they support most (if not all) the APIs of AWS S3. for instance if you use boto3 to interact with AWS S3, and if you happen to change your system from AWS S3 to something like MinIO, you just need to update the credentials and bucket name in your code without any other modifications.

To interact with AWS S3 we can use it’s Python SDK boto3

Let’s initialize a client instance

1
2
3
4
5
6
7
8
9
s3_params = {
    "aws_access_key_id": "******",
    "aws_secret_access_key": "******",
    "service_name": "s3",
    "config": Config(signature_version="s3v4"),
    "endpoint_url": "https://s3.ap-south-1.amazonaws.com",
}

client = boto3.client(**s3_params)

The bucket we’re going to use is in the region ap-south-1 and the endpoint URL is https://s3.ap-south-1.amazonaws.com. we need to pass the region or the endpoint URL when initializing the client. if we didn’t pass any of them the request would go to us-east-1 which is the default region of S3.

To know your bucket endpoint URL you can check the AWS documentation. you could also skip it and just pass the region name and boto3 will figure it out.

We have two ways to do the upload

Using boto3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
bucket_name = 'mybucket'
file_name = 'mypic.jpg'
object_name = 'test.jpg'

with open(file_name, "rb") as f:
   s3.upload_fileobj(f, bucket_name, object_name)

# or

response = s3_client.upload_file(file_name, bucket, object_name)

The object name is the name of the uploaded in S3.

Using Presigned URL

Presigned URLs provides temporary access to various operations like downloading and uploading objects. for sake of this article it’s a way to give access to someone to upload a file to your bucket without giving them the credentials. it’s a secure way to upload files to your bucket.

This is useful when you want to upload files from a browser or from a mobile app. you can generate a URL from the backend for instance and pass it to the front-end. that’s valid for a certain time and give it to the user. the user can upload the file using this URL.

First we need to generate a Presigned URL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
upload_url = client.generate_presigned_url(
    "put_object",
    Params={
        "Bucket": bucket,
        "Key": object_name,
    },
    ExpiresIn=3600,
    HttpMethod="PUT",
)

# https://mybucket.s3.amazonaws.com/test.jpg?
# X-Amz-Algorithm=AWS4-HMAC-SHA256&
# X-Amz-Credential=****%2F20240421%2Fap-south-1%2Fs3%2Faws4_request&
# X-Amz-Date=20240421T083502Z&
# X-Amz-Expires=3600&
# X-Amz-SignedHeaders=host&
# X-Amz-Signature=f0a424f7cb373afa58a9cad2c9282c0230195533e8f3adbdbe2f3939a0c3245f

The signature is set as a query parameter in the URL. as you see the region presents in the URL meaning you had to pass the correct region or endpoint URL of the bucket when initializing the client, otherwise the URL would be invalid. the bucket name is unique across all regions but the region still presents in the URL. the reason is that region is a part of Signature in version 4 (which is the default version) in opposite to version 2 which doesn’t use the region.

In the Params, we passed the bucket name and the key which is the file name that we want to create. you can also set other fields from the following values

ACL, Body, Bucket, CacheControl, ContentDisposition, ContentEncoding,
ContentLanguage, ContentLength, ContentMD5, ContentType, ChecksumAlgorithm,
ChecksumCRC32, ChecksumCRC32C, ChecksumSHA1, ChecksumSHA256, Expires,
GrantFullControl, GrantRead, GrantReadACP, GrantWriteACP, Key, Metadata,
ServerSideEncryption, StorageClass, WebsiteRedirectLocation,
SSECustomerAlgorithm, SSECustomerKey, SSECustomerKeyMD5,
SSEKMSKeyId, SSEKMSEncryptionContext, BucketKeyEnabled,
RequestPayer, Tagging, ObjectLockMode, ObjectLockRetainUntilDate,
ObjectLockLegalHoldStatus, ExpectedBucketOwner

For instance you can set ACL of the file or set ContentType which will be returned in the header response when someone downloads it.

The ExpiresIn is the time in seconds that the URL will be valid. after that time the URL will be invalid and you need to generate a new one.

After that we need to send the file to the generated URL

1
2
with open(file_name, 'rb') as f:
    response = requests.put(upload_url, data=f)

Important note is that if you set any additional fields other than Bucket and Key, you must pass them in the header when interacting with the URL.

For instance

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
upload_url = client.generate_presigned_url(
    "put_object",
    Params={
        "Bucket": bucket,
        "Key": object_name,
        "ACL": "private"
        "ContentType": "image/jpeg
    },
    ExpiresIn=3600,
    HttpMethod="PUT",
)

# https://mybucket.s3.amazonaws.com/test.jpg?
# X-Amz-Algorithm=AWS4-HMAC-SHA256&
# X-Amz-Credential=*****%2F20240421%2Fap-south-1%2Fs3%2Faws4_request&
# X-Amz-Date=20240421T095547Z&X-Amz-Expires=3600&
# X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-acl&
# X-Amz-Signature=1ba532c04126a3d4a8766ee5b0be50120e5e106e5e8c1bb1a59bb648ba4e561c

Take a look the X-Amz-SignedHeaders field. it contains the headers that you need to pass when you upload the file. in this case you need to pass content-type and x-amz-acl headers. if you don’t pass them you’ll get a permission error.

1
2
3
4
5
6
7
8
9
with open(file_name, 'rb') as f:
    response = requests.put(
        upload_url,
        data=f,
        headers={
            "x-amz-acl": "private",
            "Content-Type": "image/jpeg"
        }
    )

If you’re uploading within a browser you need to set the CORS policy of the bucket to allow the origin of the request. you can set it in the bucket policy or in the CORS configuration of the bucket. take a look at the AWS documentation for more information.

To download we need to generate a presigned url

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
download_url = client.generate_presigned_url(
    "get_object",
    Params={
            "Bucket": bucket_name,
            "Key": object_name
        },
    ExpiresIn=3600
)

response = requests.get(download_url)

with open("downloaded.jpg", 'wb'):
    f.write(response.content)

There’s another way of uploading using Presigned URLs which the advantage is that you enforce POST policy. it also gives the fields that you need to pass for uploading so it’s more straightforward

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
result = client.generate_presigned_post(
    Bucket=bucket,
    Key=object_name,
    ExpiresIn=3600
)

with open(file_name, 'rb') as f:
    response = requests.post(
        result['url'],
        data=result['fields'],
        files={'file': f}
    )