Chapter 3 - Data and Memory

Data and Memory

With what we know so far, storing seven daily temperatures looks like this:

double tempMon = 72.3;
double tempTue = 74.1;
double tempWed = 68.9;
double tempThu = 71.5;
double tempFri = 75.0;
double tempSat = 69.8;
double tempSun = 73.2;

That’s seven variables for seven days, each with a different name, each managed individually! There is no way to loop through them, no way to say “do something to each temperature in order.” If we needed a month of temperatures, we would need 30 variables, and worse, a year would need 365!

To solve this problem, we need a way to store multiple values of the same type under one name, with a way to access each one by position.

Arrays

An array is an ordered, fixed-size collection of elements that share a type. Each element is accessed by its position, called an index.

An element is a single value stored at a specific index in an array.

An index is the position of an element within an array. Indexing starts at 0.

Here is how we create an array to hold seven temperatures:

double[] temperatures = new double[7];

This one line replaces seven variable declarations. The array holds 7 elements, and we access each one by its position.

What Happens in That Line

The statement double[] temperatures = new double[7]; does more work than any single line we have written so far. Let’s break it apart.

On the right side of the =, the expression new double[7] creates an array object in memory. This object has space for 7 doubles, and each one is initialized to 0.0 (the default value for doubles).

On the left side, double[] temperatures declares a variable. But this variable does not contain the 7 doubles. It holds something new: a reference.

A reference is a value that stores the memory location of an object.

A reference variable holds the address, not the data itself.

The = stores the reference to the newly created array object in the variable temperatures.

The full translation: “Create a double array of size 7 and store a reference to it in the variable temperatures.”

Here are the discrete actions:

Allocate memory for an array of 7 doubles
Initialize all 7 elements to 0.0
Create a reference variable named temperatures
Store a reference to the array object in temperatures

This is different from what happens with int x = 5;, where the variable directly holds the value 5. Here, temperatures holds an address, and the actual data lives somewhere else in memory.

  temperatures
  ┌──────────┐
  │ ref ──────────▶ [ 0.0 ][ 0.0 ][ 0.0 ][ 0.0 ][ 0.0 ][ 0.0 ][ 0.0 ]
  └──────────┘       [0]     [1]    [2]    [3]    [4]    [5]    [6]

Reference Types vs Value Types

A value type is a type whose variables store the data directly. Integers, booleans, doubles, and structs are value types.

A reference type is a type whose variables store references to objects rather than the objects themselves. Arrays are reference types.

With a value type, the variable is the storage:

  x
  ┌─────┐
  │  5  │
  └─────┘

With a reference type, the variable points to the storage:

  temperatures
  ┌──────────┐
  │ ref ──────────▶ [ 0.0 ][ 0.0 ][ 0.0 ][ ... ]
  └──────────┘

This difference changes how assignment works, and we will see exactly how later in this section.

Accessing Elements: The Shift Model

To reach a specific element, we start at the location the reference points to and shift forward by the index:

temperatures[0] = 72.3;

Translation: “Go to the location of temperatures, shift by 0 doubles, and store the value 72.3 there.”

An index of 0 means no shift at all. We stay at the very start of the array, where the first element lives.

temperatures[3] = 71.5;

Translation: “Go to the location of temperatures, shift by 3 doubles, and store the value 71.5 there.”

Shifting by 3 means we move past three elements to reach the fourth position.

double today = temperatures[3];

Translation: “Go to the location of temperatures, shift by 3 doubles, read the value there, and store it in today.”

The pattern for every array access is the same: go to where the array lives, shift to the position you want, then read or write.

Two Operations, One Notation

In Chapter 1, we saw that a variable name means different things depending on which side of = it appears. On the left, it is a write target. On the right, the thing is evaluated.

The same is true for array access. Consider two lines using our temperatures array:

temperatures[3] = 10;
double x = temperatures[3];

Line 1: temperatures[3] is on the left. This means we write. “Go to the location of temperatures, shift by 3 doubles, and store 10 there.”

Line 2: temperatures[3] is on the right. This is a read. “Go to the location of temperatures, shift by 3 doubles, return the value there.”

The notation and the shift are identical in both cases. The difference is what happens when we arrive: we either store a value or retrieve one. The translations distinguish these with “store” for writes and “read” for reads.

Try it yourself.

Translate each line to English:

int[] scores = new int[10];
scores[0] = 85;

Write your answers before revealing ours.

Reveal answer

Line 1: “Create an integer array of size 10 and store a reference to it in scores.”

Line 2: “Go to the location of scores, shift by 0 integers, and store the value 85 there.”

If your answers differed, note what you missed before continuing.

Why Indices Start at 0

The index is the offset from the start of the array. The first element sits right at the start, 0 positions away. The second element is 1 position away. The third is 2 positions away.

This is why we say “shift.” The index tells us how far to move from the beginning. At the hardware level, the computer uses the index to calculate an address directly. If each integer takes 4 bytes and the array starts at memory address 1000:

Address:  1000    1004    1008    1012    1016
Index:    [0]     [1]     [2]     [3]     [4]

The address of any element is start + (index × element_size). For index 3, that is 1000 + (3 × 4) = 1012. One multiplication, one addition, and the computer jumps straight to the right location. This is what “shift by 3” means at the hardware level.

Fixed Size

Arrays cannot grow or shrink after creation. When we write new int[5], the computer allocates a single contiguous block of memory with room for exactly 5 integers, all sitting next to each other with no gaps.

┌─────┬─────┬─────┬─────┬─────┐
│ [0] │ [1] │ [2] │ [3] │ [4] │
└─────┴─────┴─────┴─────┴─────┘
  one continuous block of memory

This contiguous layout is what makes the shift calculation work. If elements could be scattered across memory, we could not compute an element’s location from its index. We choose the size at creation, and it cannot change. If we need more space later, we must create a new, larger array and copy the data over. (We will build exactly this operation when we cover functions.)

Default Values

We created a 7-element temperatures array, but we have only stored values in a few positions so far. What is in the rest?

When an array is created, every element holds the default value for its type:

Type	Default Value
`int`	0
`double`	0.0
`bool`	false
`string`	null

int[] numbers = new int[3];
// numbers[0] is 0, numbers[1] is 0, numbers[2] is 0

Every slot holds something from the moment the array is created. Until we store our own values, each element holds its type’s default.

The .Length Property

Every array knows its own size:

int[] data = new int[5];
Console.WriteLine(data.Length); // Output: 5

data.Length evaluates to the number of elements in the array that data refers to. Length is set at creation and does not change for the life of the array.

A First Glimpse: Loops and Arrays

With .Length and indexing, we can solve the problem we started with. Instead of seven separate variables, we have one array, and because elements are accessed by index, a loop can visit every element in order:

double[] temperatures = new double[7];
 
for (int i = 0; i < temperatures.Length; i++)
{
    temperatures[i] = i * 10.0;
}

The loop variable i starts at 0 and counts up to (but not including) temperatures.Length. On each iteration, it becomes an index: “go to the location of temperatures, shift by i doubles, and store the value of i × 10.0 there.”

When i is 0, we shift by 0 and store 0.0. When i is 3, we shift by 3 and store 30.0. The loop generates every valid index, and we use each one to access a different element. This is the payoff of arrays: a loop can do the same work that would otherwise require dozens of individual statements. We will explore traversal thoroughly in the next section.

Valid Index Range

For an array of size n, valid indices are 0 through n - 1.

int[] data = new int[5];
// Valid:   data[0], data[1], data[2], data[3], data[4]
// Invalid: data[5], data[-1]

The shift model makes this concrete. Shifting by 5 in a 5-element array moves past the end of the allocated block, into memory that does not belong to this array. Shifting by -1 moves before the start, which is equally invalid.

Accessing an invalid index throws an IndexOutOfRangeException, and the program crashes. This is one of the most common errors when working with arrays, and we will practice identifying it throughout this chapter.

Try it yourself.

Translate this code to English:

int[] counts = new int[4];
counts[2] = 7;
int x = counts[2];

Write your answers, then check them against the reveal.

Reveal answer

Line 1: “Create an integer array of size 4 and store a reference to it in counts.”

Line 2: “Go to the location of counts, shift by 2 integers, and store the value 7 there.”

Line 3: “Create an integer variable called x and bind the result of going to counts, shifting by 2 integers, and returning the value stored there.”

If anything differed, note what you missed before continuing.

The Mechanism: Token-by-Token

In Chapter 1, we broke down bool flag = true; one token at a time. Let’s do the same for element access, since that is the new operation arrays introduce.

scores[3] = 42;

scores — follow the reference to locate the array in memory
[3] — shift by 3 elements from the start
= — prepare to write
42 — the value to store
; — statement complete

The bracket notation [3] is the shift operation. It says “starting from where this array lives, move forward 3 positions.”

And for reading, int x = scores[3];:

int x — create an integer variable named x
= — prepare to store a value
scores — follow the reference to locate the array in memory
[3] — shift by 3 elements from the start, read the value there
; — statement complete

Notice the new keyword does not appear in either line. That keyword signals object creation. It appeared when we created the array, because that is when memory was allocated. Accessing an element uses an existing object; it does not create a new one.

How References Change Assignment

Now we reach the concept that makes reference types fundamentally different from value types.

With value types, assignment copies the value:

int x = 5;
int y = x;    // y gets its own copy of 5
x = 10;       // y is still 5

after line	x	y
1	5	—
2	5	5
3	10	5

Changing x does not affect y. Each variable has its own independent storage.

With reference types, assignment copies the reference, not the object:

int[] ar = new int[3];
ar[0] = 10;
int[] other = ar;
other[0] = 99;

Line 3 translation: “Copy the reference stored in ar and store it in other.”

After line 3, both ar and other hold references to the same array object. There is still only one array in memory, and two variables point to it.

  ar                                   
  ┌──────────┐                         
  │ ref ──────────▶ [ 99 ][ 0 ][ 0 ]
  └──────────┘    |▶ [0]   [1]  [2]  
  other           |
  ┌──────────┐    |
  │ ref ──────────|
  └──────────┘

Line 4 translation: “Go to the location of other, shift by 0, and store 99 there.”

Because ar and other point to the same array, ar[0] is now also 99. The modification was made through other, but both references lead to the same place.

after line	ar	ar[0]	other	other[0]	other[1]	other[2]
1	ref→array	0	—	—	—	—
2	ref→array	10	—	—	—	—
3	ref→array	10	ref→same array	10	0	0
4	ref→array	99	ref→same array	99	0	0

Compare this to value types, where int y = x; creates an independent copy. With arrays, int[] other = ar; creates a second name for the same object.

Try it yourself.

Predict the state after each line:

int[] a = new int[2];
a[0] = 5;
a[1] = 10;
int[] b = a;
b[1] = 20;

What is a[1] after line 5?

Write your answer before revealing ours.

Reveal answer

a[1] is 20.

Line 4 copies the reference, not the array. Both a and b refer to the same array object. When line 5 stores 20 at index 1 through b, the change is visible through a as well.

after line	a[0]	a[1]	b[0]	b[1]	Same array?
1	0	0	—	—	—
2	5	0	—	—	—
3	5	10	—	—	—
4	5	10	5	10	Yes
5	5	20	5	20	Yes

The key is line 4: it copies the reference, not the data. If your prediction differed, trace through each line again with the diagram in mind.

Comparing the Two Models

Let’s put value types and reference types side by side once more, because this distinction matters throughout the rest of this chapter and beyond.

Value types: The variable holds the data.

int x = 5;
int y = x;   // independent copy
y = 99;      // x is still 5

  x           y
  ┌─────┐    ┌─────┐
  │  5  │    │ 99  │
  └─────┘    └─────┘
  (separate storage)

Reference types: The variable holds an address. The data lives elsewhere.

int[] ar = new int[3];
int[] other = ar;    // same address
other[0] = 99;       // ar[0] is also 99

  ar                                    other
  ┌──────────┐                          ┌──────────┐
  │ ref ──────────▶ [ 99 ][ 0 ][ 0 ] ◀──────── ref │
  └──────────┘                          └──────────┘
  (shared storage) (note, other also points to the same starting location, so other[0] = 99. However this was hard to represent with unicode arrows.)

With value types, each variable owns its data. With reference types, multiple variables can share the same object. This is not a bug; it is how references work, and it becomes useful when we start writing functions that operate on arrays.

Same values, different objects: Two arrays can hold identical values and still be separate objects.

int[] ar = new int[3];
ar[0] = 5;
int[] other = new int[3];
other[0] = 5;

Both ar[0] and other[0] are 5, but ar and other refer to different objects. The new keyword appeared twice, so two objects were created.

  ar                                                      other
  ┌──────────┐                                            ┌──────────┐
  │ ref ──────────▶ [ 5 ][ 0 ][ 0 ]    [ 5 ][ 0 ][ 0 ] ◀──────── ref │
  └──────────┘                                            └──────────┘
  (separate storage, same values)

Changing other[0] does not affect ar[0], because each reference leads to a different object. If we had written int[] other = ar; instead, no second object would exist.

Rebinding References

A program sometimes needs to switch from one dataset to another. Suppose we finish processing one batch of results and want to start fresh with a new array. Reference variables can be rebound, just like value type variables:

int[] ar = new int[3];
ar[0] = 10;
ar = new int[2];

Line 3 translation: “Create an integer array of size 2 and store a reference to it in ar.”

After line 3, ar refers to a different array object. The original 3-element array, with the value 10 at index 0, is no longer reachable through ar.

  ar (before line 3)                    ar (after line 3)
  ┌──────────┐                          ┌──────────┐
  │ ref ──────────▶ [ 10 ][ 0 ][ 0 ]    │ ref ──────────▶ [ 0 ][ 0 ]
  └──────────┘     (no longer           └──────────┘
                    reachable)

The old array still exists in memory, but nothing refers to it. From our program’s perspective, it is gone.

Try it yourself.

Write C# code for each description:

“Create an integer array of size 4 and store a reference to it in counts.”
“Go to the location of counts, shift by 2 ints, and store the integer 7 there.”
“Create an integer array called backup and bind the reference stored in counts to it.”
“Create an integer called first and bind the result of going to backup, shiting by 0, and returning the value there.”

Write your answers, then check them below.

Reveal answer

int[] counts = new int[4];
counts[2] = 7;
int[] backup = counts;
int first = backup[0];

If your code differed, compare it to the answer and note what was different.

Review

Before continuing, test yourself on what you have learned. Use the protocol from Chapter 0: attempt each exercise from memory, then search the chapter to check your answers, then note what you missed.

Part 1: Definitions

Write the definitions from memory, then find them in the chapter to check.

What is an array?
What is an index?
What is an element?
What is a reference?
What is a reference type?
What is a value type?

If any of your answers differed from the definitions in this section, note what you missed and write the corrected version.

Part 2: Translations

Translate each line of code to English, then check against the translation patterns shown earlier in this section.

double[] prices = new double[6];
prices[0] = 9.99;
double sale = prices[3];
double[] copy = prices;
copy[0] = 0.0;

For line 5, also answer: what is prices[0] after this line runs? Why?

If any translations differed, note what you missed and write the corrected version.

Part 3: Mechanism Breakdown

Break down each line token-by-token, as shown in the mechanism section earlier in this chapter.

flags[2] = true;
bool x = flags[0];

Check your breakdown against the mechanism section.

Part 4: Writing Code

Write C# code for each description.

Create a boolean array of size 3 and store a reference to it in flags.
Go to the location of flags, shift by 1, and store the value true there.
Go to the location of flags, shift by 0, read the value there, and store it in a boolean variable named first.
Copy the reference stored in flags and store it in a variable named alias.
Go to the location of alias, shift by 2, and store the value true there.

After writing the code, draw a memory diagram showing flags and alias after all five lines have run.

Check your code against the examples in this section.

Part 5: State Tables

Complete the state table for this code:

int[] ar = new int[3];
ar[0] = 5;
ar[1] = 10;
ar[2] = 15;
int[] other = ar;
other[1] = 99;
ar[2] = other[0];

after line	ar[0]	ar[1]	ar[2]	other[0]	other[1]	other[2]
1
2
3
4
5
6
7

Trace through each line. For lines 5 through 7, both ar and other refer to the same array object. A change through one name is visible through the other.

Part 6: Diagrams

Draw a memory diagram for this code:

int[] x = new int[2];
x[0] = 3;
x[1] = 7;
int[] y = x;
int[] z = new int[2];
z[0] = 3;
z[1] = 7;

Your diagram should show three variables. Two of them refer to the same array object. One refers to a different array object that happens to hold the same values.

After drawing, answer: if we run y[0] = 100;, which variables see the change? Why?

Previous: Chapter 3 Overview

Next: Section 2 - Basic Computation

Fundamentals of Computer Science

Explorer

Chapter 3 - Data and Memory

Data and Memory

Arrays

What Happens in That Line

Reference Types vs Value Types

Accessing Elements: The Shift Model

Two Operations, One Notation

Why Indices Start at 0

Fixed Size

Default Values

The .Length Property

A First Glimpse: Loops and Arrays

Valid Index Range

The Mechanism: Token-by-Token

How References Change Assignment

Comparing the Two Models

Rebinding References

Review

Part 1: Definitions

Part 2: Translations

Part 3: Mechanism Breakdown

Part 4: Writing Code

Part 5: State Tables

Part 6: Diagrams

Table of Contents

Backlinks