For some weird reason I visited HackerRank and tried the Malaysian Computing Olympiad Qualifying Round 3 again, and Question 4 captured my attention:

*Given a tree, denote the score of a vertex v as the sum of pairwise product of the weight of all vertex pairs of the subtree of which v is the root. Find the score of each vertex.*

Basically, my way of solving this goes as:

- For each edge we identify which one is the parent and which one is the child. (1 is the root of the main tree).
- For each vertex v we identify the sum of weight of the vertices in the subtree of which v is the root, and the sum of
*square*of weight of the vertices in the same subtree by depth-first search. This is because 2*sum of xy=(sum of x)^2-sum of (x^2)

Now it’s the implementation that is worth a mention. The fact that the number of children of each vertex is unknown beforehand suggests that we need a flexible container, so a 2D vector was chosen. I tried hard to make the running time proportional to the number of vertices by making sure that each computation of weight takes constant time. As I submitted…

“Status: terminated due to timeout for subtask 3”

*Incredulous.*

Mindblown. Confused. I tried going back and investigate which part took the most time to run, and noticed the following: the running time was significantly decreased once I removed the part where I update the adjacency vector via the `push_back`

function (of course my program isn’t functional properly in this case so a “wrong answer” flag is raised, but at least it shed some light on the running time).

I tried the problem again by modifying my code from adjacency vector to adjacency linked list. Now it said:

“Status: Accepted. Time: 1.8s” (time limit is 2s)

*Seriously? *I guess the problem stems from the fact that the `push_back`

function has amortized (not worst case) constant runtime (I think), which implies a (probably) high constant factor. Alex once explained (in a lunch-together session) that a the STL vector implementation involves dynamic allocation of a chunk of memory, then reallocation to a bigger chunk of memory as necessary. Reallocation, though only happens occasionally, takes running time proportional to the number of elements (hence increasing the amortized constant factor).

Is linked list better? Yes, in this case. But I can’t tell much about the general case.

*Fine. I have been trolled.*

Here’s my pseudocode in any case:

Function dfs (integer i, vector weight, vector sum, vector square, vector desc) is:

sum[i]<-weight[i]

square[i]<-(weight[i])^2

For each v in desc[i] do:

if sum[v] not known do:

dfs(v, weight, sum, square, desc)

sum[i]<-sum[i]+sum[v]

square[i]<-square[i]+square[v]

main:

Given: edgelist, weight

For each edge <u,v> do:

prepend v to adj[u]

prepend u to adj[v],

where adj[i] is an adjacency list containing the neighbours of i

queue q, push 1 into it

while q not empty do:

let w be the front element.

For each x in adj[w] do:

if x hasn't been in the queue:

prepend x to desc[w]

where desc[i] is the list containing the children of i

`Perform dfs(1, weight, sum, square, desc)`

For each vertex v do:

print (square[v]-(sum[v])^2)/2