(you are here)

Encyclopædia

Let's solve Encyclopædia.

So far, we've tried to keep the tutorials for Python and C++ programmers the same. Many important concepts are shared between programming languages and we've been lucky so far that the syntax has been similar enough that we could get away with accommodating both languages at the same time.

Unfortunately, at this point the tutorial diverges somewhat. You can select your language in the list below to view the tutorial for your language.

Storing lists of data (Python)

In this problem, we are given a list of word counts on the pages of an encyclopædia. Our program must answer multiple questions of the form: "How many pages are there on page x?"

So far, whenever we've encountered problems involving large sets of data, I've tried to describe solutions that only 'remember' one or two variables at a time. With this problem such an approach is no longer possible.

You may be familiar with lists - these are data structures that let you store large quantities of data. In this walkthrough I'll be using Python lists.

In Python, a list is declared like this:

# Declares an empty list called a
a = []

We can then add elements to the list using append:

a = []
a.append(3)
a.append(1)
a.append(4)
a.append(1)
a.append(5)
a.append(2)
a.append(6)
a.append(5)

We can access the eight values of the list as a[0], a[1], ..., a[6], a[7] (note that the numbering system is zero-based: counting begins at zero, not one. This may seem unintuitive, but turns out to be quite useful when we try manipulating arrays in advanced ways). For example, if you were to print(a[0]), you would get 3.

The elements of a list can be manipulated like normal variables. For example, we could write a correct, albiet strange, solution to Addition like so:

# Read values into a[0] and a[1]
a = []
a.append(x)
a.append(y)

outputFile.write(str(a[0] + a[1]) + "\n")

(Add code to correctly handle file input and/or output.)

As we read input data using a loop, we can store that data

wordcounts = []
# range(nPages) counts 0, 1, ..., nPages-2, nPages-1. It's shorthand for range(0, nPages).
for i in range(nPages):
wordcounts.append(x)
# The integer we read in is now stored in <b>wordcounts[i]</b>

However, the real power of lists lies in our ability to access their contents using mathematical expressions. If the variable foo is set to 5, then a[foo] refers to a[5] and we can manipulate a[foo] and store values inside it like any other variable. If the variable i counts from 0 to N-1 over the course of a loop, then in each iteration a[i] will refer to a different element of the list.

Accessing arbitrary elements of wordcounts is also straightforward, so long as we remember the numbering system we are using. The following code reads in some number x and prints out the xth integer in the list:

# Read an integer into x
# Write the value of wordcounts[x-1]
outputFile.write(str(wordcounts[x-1]) + '\n')

This is quite close to what our current problem asks us to do. The above code would have to be placed inside a loop so that it dealt with the correct number of questions, but otherwise it is already very close to a full solution.

Confidence with using lists to store and manipulate large amounts of data is a very helpful skill, both in most real-world programming scenarios and in competitions like the Australian Informatics Olympiad.

Storing lists of data (C++)

In this problem, we are given a list of word counts on the pages of an encyclopædia. Our program must answer multiple questions of the form: "How many pages are there on page x?"

So far, whenever we've encountered problems involving large sets of data, I've tried to describe solutions that only 'remember' one or two variables at a time. With this problem such an approach is no longer possible.

You may be familiar with arrays - these are data structures that let you store large quantities of data. In this walkthrough I'll be using C-style arrays as an example.

In C/C++, an array is declared much like a normal variable. Say we needed room for 8 int-type variables. Then we would write:

int a[8];

The eight variables we now have access to are a[0], a[1], ..., a[6], a[7]. (Note that the numbering system is zero-based: counting begins at zero, not one. This may seem unintuitive, but turns out to be quite useful when we try manipulating arrays in advanced ways.)

The elements of an array can be manipulated like normal variables. For example, we could write a correct, albiet strange, solution to Addition like so:

// Read values into a[0] and a[1]
fscanf(inputFile, "%d %d", &a[0], &a[1]);
a[2] = a[0] + a[1];
// Print out the value of a[2]
fprintf(outputFile, "%d", a[2]);

(Add code to correctly handle file input and/or output.)

However, the real power of arrays lies in our ability to access their contents using mathematical expressions. If the variable foo is set to 5, then a[foo] refers to a[5] and we can manipulate a[foo] and store values inside it like any other variable. If the variable i counts from 0 to N-1 over the course of a loop, then in each iteration a[i] will refer to a different element of the array.

This is best seen with an example. The following code reads in a list of nPages integers into an array named wordcounts.

int wordcounts[10000];
for (int i = 0; i < nPages; i++) {
// Read an integer into <b>wordcounts[i]</b>
}

At the first iteration of the loop, i is set to zero and so the first integer is stored in wordcounts[0]. The second integer is stored in wordcounts[1], and so on. So if we are looking for 'the xth integer', then it is stored in wordcounts[x-1].

Accessing arbitrary elements of wordcounts is also straightforward, so long as we remember the numbering system we are using. The following code reads in some number x and prints out the xth integer in the list:

int x;
// Read an integer into x
fscanf(inputFile, "%d", &x);
// Write the value of wordcounts[x-1]
fprintf(outputFile, "%d\n", wordcounts[x-1]);

You might be wondering why we declared an array of 10000 integers (int wordcounts[10000];), even though we might not use all 10000 slots. Isn't that rather wasteful? The answer is that for real-world code, it would indeed be wasteful. We should be using a dynamic array. However, for convenience and simplicity in informatics, we often just declare an array with a fixed size that is large enough to encompass the largest possible input we could receive (in this problem, we are told that there will be at most 10000 pages in the encyclopædia).

This is quite close to what our current problem asks us to do. The above code would have to be placed inside a loop so that it dealt with the correct number of questions, but otherwise it is already very close to a full solution.

Confidence with using arrays to store and manipulate large amounts of data is a very helpful skill, both in most real-world programming scenarios and in competitions like the Australian Informatics Olympiad.