summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBen Avison <bavison@riscosopen.org>2013-01-24 00:27:46 (GMT)
committerBen Avison <bavison@riscosopen.org>2013-01-24 15:58:47 (GMT)
commitce2409fa451567fd8ea146fe84e2658a5b41df9c (patch)
treec0bd3005ba2a99cd33edc9e69203234862bce647
parent3c73d4e116ac9892270eb7badf7894151b15ae97 (diff)
downloadpixman-arm-simd.tar.gz
pixman-arm-simd.tar.xz
Improve L1 and L2 benchmark tests for caches that don't use allocate-on-writearm-simd
In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which are usually configured this way. For other CPUs, this should only add a constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs. The problems were caused by cachelines becoming permanently evicted from the cache, because the code that was intended to pull them back in again on each iteration assumed too long a cache line (for the L1 test) or failed to read memory beyond the first pixel row (for the L2 test). Also, the reloading of the source buffer was unnecessary. These issues were identified by Siarhei in this post: http://lists.freedesktop.org/archives/pixman/2013-January/002543.html
-rw-r--r--test/lowlevel-blt-bench.c31
1 files changed, 25 insertions, 6 deletions
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 7336fa0..8e80b42 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -33,6 +33,14 @@
#define L1CACHE_SIZE (8 * 1024)
#define L2CACHE_SIZE (128 * 1024)
+/* This is applied to both L1 and L2 tests - alternatively, you could
+ * parameterise bench_L or split it into two functions. It could be
+ * read at runtime on some architectures, but it only really matters
+ * that it's a number that's an integer divisor of both cacheline
+ * lengths, and further, it only really matters for caches that don't
+ * do allocate0on-write. */
+#define CACHELINE_LENGTH (32) /* bytes */
+
#define WIDTH 1920
#define HEIGHT 1080
#define BUFSIZE (WIDTH * HEIGHT * 4)
@@ -168,18 +176,29 @@ bench_L (pixman_op_t op,
int width,
int lines_count)
{
- int64_t i, j;
+ int64_t i, j, k;
int x = 0;
int q = 0;
volatile int qx;
for (i = 0; i < n; i++)
{
- /* touch destination buffer to fetch it into L1 cache */
- for (j = 0; j < width + 64; j += 16) {
- q += dst[j];
- q += src[j];
- }
+ /* For caches without allocate-on-write, we need to force the
+ * destination buffer back into the cache on each iteration,
+ * otherwise if they are evicted during the test, they remain
+ * uncached. This doesn't matter for tests which read the
+ * destination buffer, or for caches that do allocate-on-write,
+ * but in those cases this loop just adds constant time, which
+ * should be successfully cancelled out.
+ */
+ for (j = 0; j < lines_count; j++)
+ {
+ for (k = 0; k < width + 62; k += CACHELINE_LENGTH / sizeof *dst)
+ {
+ q += dst[j * WIDTH + k];
+ }
+ q += dst[j * WIDTH + width + 62];
+ }
if (++x >= 64)
x = 0;
call_func (func, op, src_img, mask_img, dst_img, x, 0, x, 0, 63 - x, 0, width, lines_count);