- RubyCademy's Newsletter
- Posts
- RubyCademy - Serialize and deserialize objects like a PRO
RubyCademy - Serialize and deserialize objects like a PRO
How do Gitlab, Homebrew, and Mongoid serialize and deserialize objects in Ruby?
Tommy Lee Jones plays Samuel Gerard in U.S. MARSHALL
In this issue, we’re going to dive into object marshaling.
I'll explain what it is, look at the Marshal module, and then go through a concrete example.
We’ll then go a step deeper and compare the dump and self.load methods.
buckle up and let's get down to business
What's Object Marshaling?
When writing code, you may need to save and transfer an object for reuse in another program or future program runs.
Object marshaling, such as in Sidekiq, involves serializing a job (which is essentially an object) into JSON and inserting it into Redis.
The Sidekiq process can then deserialize the JSON and recreate the original job.
In programming, this serialization and deserialization process is known as object marshaling.
Let's explore how Ruby handles object marshaling natively.
The Marshal Module
Ruby, being a fully object-oriented language, offers the Marshal module in its standard library for object serialization and storage.
With Marshal, you can serialize an object into a byte stream, which can then be stored and deserialized by another Ruby process.
Now, let's serialize a string and examine the resulting serialized object.
We then call the Marshal.dump module method to serialize our string. We store the return value—which contains our serialized string—in the serialized variable.
This string can be stored in a file and the file can be reused to reconstitute the original object in another process.
We then call the Marshal.load method to reconstitute the original object from the byte stream.
We can see that this freshly reconstituted string has a different object_id than the greeting string, which means it's a different object, but it contains the same data.
Pretty cool! But how is the Marshal module able to reconstruct the string? And, what if I want to have control over which attributes to serialize and deserialize?
A Concrete Example of Object Marshaling
To address these questions, let's apply a marshaling strategy to a custom struct called User.
The User struct defines 3 attributes: fullname, age, and roles.
For this example, we have a business rule where we only serialize when it matches the following criteria:
The fullname contains less than 64 characters
The roles array does not contain the :admin role
To do so, we can define a User#marshal_dump method to implement our custom serialization strategy.
This method will be called when we invoke the Marshal.dump method with an instance of User struct as a parameter.
Let’s define this method:
In the above example, we can see that our User#marshal_dump method is called when we invoke Marshal.dump(user).
The user_dump variable contains the string which is the serialization of our User instance.
Now that we have our dump, let’s deserialize it to reconstitute our user. To do so, we define a User#marshal_load method which is in charge of implementing the deserialization strategy of a User dump.
So let’s define this method:
In the above example, we can see that our User#marshal_load method is called when we invoke Marshal.load(user_dump).
The original_user variable contains a struct which is a reconstitution of our user instance.
Note that original_user.roles is not similar to user.roles since during the serialization, user.roles included the :admin role.
So user.roles wasn’t serialized into user_dump.
The dump and self.load Methods
When Marshal.dump and Marshal.load are invoked, these methods call the marshal_dump and the marshal_load methods on the object passed as the parameter of these methods.
But, what if I tell you that the Marshal.dump and the Marshal.load methods try to call two other methods named dump and self.load on the object passed as a parameter?
The _dump Method
The differences between the marshal_dump and the _dump methods are:
you need to handle the serialization strategy at a lower level when using the _dump method — you need to return a string that represents the data to serialize
the marshal_dump method takes precedence over _dump if both are defined
Let’s have a look at the following example:
In the User#_dump method, we have to instantiate and return the serialization object — the string that represents your serialization.
In the following example, we define User#marshal_dump and User#_dump methods and return a string to see which method is called
We can see that only the User#marshal_dump is called even though they’re both defined.
The self._load Method
Now, let's look at the marshal_load and _load methods.
The differences between the marshal_load and the _load methods are:
You need to handle the deserialization strategy at a lower level when using the _load method — You are in charge of instantiating the original object.
The marshal_load method takes a deserialized object as an argument when the _self.load method takes the serialized string as an argument.
The marshal_load method is an instance method when the self._load is a class method.
Let’s take a look at the following example:
In the User._load method:
we deserialize the string returned by the User#_dump method
we instantiate a new User by passing the deserialized information
We can see that we are in charge of allocating and instantiating the object used to reconstitute our original_user.
So the Marshal.load coupled to marshal_load takes care of instantiating the reconstituted original object.
Then it calls the marshal_load method with the serialized object passed as argument on the freshly instantiated object.
On the contrary, a call to Marshal.load coupled to load lets the self.load class method be in charge of:
deserializing the data returned by the _dump method
instantiating the reconstituted original object
Conclusion
Depending on your needs, you can decide to implement a higher or lower serialization/deserialization strategy. To do so, you can use the Marshal module coupled with the appropriate Marshal hook methods.
Hope you enjoyed this issue!
Feel free to ask any questions OR to give me your feedback in the comments section
I'll be more than happy to reply to your messages!
Also, if you like this kind of content, you can join RubyCademy today.
Let us guide you through your journey to Ruby mastery: https://www.rubycademy.com
Mehdi 💻