Wednesday, 15 August 2012

Deduplication


Deduplication Overview -- NetApp Storage Efficiency :



Deduplication is a technique which is used to reduce space by discarding duplicate blocks.

when SIS (single instance storage) is run on volume it scans each and every block and assigns digital signature and then it will compare all the signatures. The block which have same signature is considered to be having same data, hence it will keep only one block and discards remaining blocks. 

The inodes are adjusted to point only one block.


Deduplication – The NetApp Approach


The goal of the testing here is to compare storage performance of a data set before and after deduplication. Sometimes capacity is the only factor, but sometimes performance matters. The test is random 4KB reads against a 100GB file. The 100GB file represents significantly more data than the test system can fit into its’ 16GB read cache. I am using 4KB because that is the natural block size for NetApp.
To maximize the observability of the results in this deduplication test, the 100GB file is completely full of duplicate data. For those who are interested, the data was created by doing a dd from /dev/zero. It does not get any more redundant than that. I am not suggesting this is representative of a real world deduplication scenario. It is simply the easiest way to observe the effect deduplication has on other aspects of the system.
This is the output from sysstat -x during the first test. The data is being transferred over NFS and the client system has caching disabled, so all reads are going to the storage device. (The command output below is truncated to the right, but the important data is all there.)

Random 4KB reads from a 100GB file – pre-deduplication:
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 19%  6572     0     0    6579  1423 27901  23104     11     0     0     7   16%   0%  - 100%      0     7     0     0     0     0
 19%  6542     0     0    6549  1367 27812  23265    726     0     0     7   17%   5%  T 100%      0     7     0     0     0     0
 19%  6550     0     0    6559  1305 27839  23146     11     0     0     7   15%   0%  - 100%      0     9     0     0     0     0
 19%  6569     0     0    6576  1362 27856  23247    442     0     0     7   16%   4%  T 100%      0     7     0     0     0     0
 19%  6484     0     0    6491  1357 27527  22870      6     0     0     7   16%   0%  - 100%      0     7     0     0     0     0
 19%  6500     0     0    6509  1300 27635  23102    442     0     0     7   17%   9%  T 100%      0     9     0     0     0     0

The system is delivering an average of 6536 NFS operations per second. The cache hit rate hovers around 16-17%. As you can see, the working set does not fit in primary cache. This makes sense. The 3170 has 16GB of primary cache and we are randomly reading from a 100GB file. Ideally, we would like to get a 16% cache hit rate (16GB cache / 100GB working set) and we are very close. The disks are running at 100% utilization and are clearly the bottleneck in this scenario. The spindles are delivering as many operations as the are capable of. So what happens if we deduplicationthis data?
First, we need to activate deduplication, a_sis in NetApp vocabulary, on the test volume and deduplicate the test data. (Before deduplication became the official buzz word, NetApp referred to their technology as Advanced Single Instance Storage.)
fas3170-a> sis on /vol/test_vol
SIS for "/vol/test_vol" is enabled.
Already existing data could be processed by running "sis start -s /vol/test_vol".
fas3170-a> sis start -s /vol/test_vol
The file system will be scanned to process existing data in /vol/test_vol.
This operation may initialize related existing metafiles.
Are you sure you want to proceed (y/n)? y
The SIS operation for "/vol/test_vol" is started.
fas3170-a> sis status
Path                           State      Status     Progress
/vol/test_vol                  Enabled    Initializing Initializing for 00:00:04
fas3170-a> df -s
Filesystem                used      saved       %saved
/vol/test_vol/         2277560  279778352          99%
fas3170-a>

There are a few other files on the test volume that contain random data, but the physical volume size as been reduced by over 99%. This means our 100GB file is now less that 1GB in size on disk. So, let’s do some reads from the same file and see what has changed.
Random 4KB reads from a 100GB file – post-deduplication:
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 93% 96766     0     0   96773 17674 409570    466     11     0     0    35s  53%   0% -    6%      0     7     0     0     0     0
 93% 97949     0     0   97958 17821 413990    578    764     0     0    35s  53%   8% T    7%      0     9     0     0     0     0
 93% 99199     0     0   99206 18071 419544    280      6     0     0    34s  53%   0% -    4%      0     7     0     0     0     0
 93% 98587     0     0   98594 17941 416948    565    445     0     0    36s  53%   6% T    6%      0     7     0     0     0     0
 93% 98063     0     0   98072 17924 414712    398     11     0     0    35s  53%   0% -    5%      0     9     0     0     0     0
 93% 96568     0     0   96575 17590 408539    755    502     0     0    35s  53%   8% T    7%      0     7     0     0     0     0

There has been a noticeable increase in NFS operations. The system has gone from delivering 6536 NFS ops to delivering 96,850 NFS ops. That is nearly fifteen-fold increase in delivered operations. The CPU utilization has gone up roughly 4.9x. The disk reads have dropped to almost 0 and the system is serving out over 400MB/s. This is a clear indication that the operations are being serviced from cache instead of from disk. It is also worth noting that the average latency, as measured from the host, has dropped by over 80%. The improvement in latency is not surprising given that the requests are no longer being serviced from disk.
The cache age has dropped down to 35 seconds. Cache age is the average age of the blocks that are being evicted from cache to make space for new blocks. The test had been running for over an hour when this data was captured, so this is not due to the load ramping. This suggests that even though we are accessing a small number of disk blocks, the system is evicting blocks from cache. I suspect this is because the system is not truly deduplicating cache. Instead, it appears that each logical file block is taking up space in cache even though they refer to the same physical disk block. One potential explanation for this is that NetApp is eliminating the disk read by reading the duplicate block from cache instead of disk. I am not sure how to validate this through the available system stats, but I believe it explains the behavior. It explains why the NFS ops have gone up, the disk ops have gone down, and the cache age has gone down to 35 seconds. While it would be preferable to store only a single copy of the logical block in cache, this is better than reading all of the blocks from disk.
The cache hit percentage is a bit of a puzzle here. It is stable at 53% and I am not sure how to explain that. The system is delivering more than 53% of the read operations from cache. The very small number of disk reads shows that. Maybe someone from NetApp will chime in and give us some details on how that number is derived.
This testing was done on Data ONTAP 7.3.1 (or more specifically 7.3.1.1L1P1). I tried to replicate the results on versions of Data ONTAP prior to 7.3.1 without success. In older versions, the performance of the deduplicated volume is very similar to the original volume. It appears that reads for logically different blocks that point to the same physical block go disk prior to 7.3.1.



No comments:

Post a Comment