What do you know about Warangal?

Hastily written, please excuse typos.

First of all I would like to thank everyone who took the time to take this poll on Facebook. You guys are fantastic people for having taken the time to do this frivolous poll and I hope it was fun.

Polling has been a topic of interest for a while and it has only been exacerbated by how wrong polls have been all of 2016.

Recently Suhas Mathur flagged to me this excellent paper on using meta-knowledge to improve polls. It is totally worth reading and if this is your cup of tea you should totally go read it (rather than the rest of this post)

Here is an example from the paper itself:

As an elementary example, consider the (false) proposition that Chicago is the capital of the state of Illinois. Respondents might form different opinions about the truth of the proposition, depending on whether they knew: (a) that Chicago is a large city, (b) that it is located in the state of Illinois, (c) that Springfield is the actual capital of Illinois, and so on. If the typical person is aware of (a) and (b) but not of (c), then the majority of those queried might vote for the incorrect answer, that the proposition is True. A democratic poll ignores the asymmetry in metaknowledge between respondents who know the right answer and those who do not. Those who know that Chicago is not the capital of Illinois can imagine that many others will be misled. A comparable insight into the opinions of others is not available to those who falsely believe the answer is Yes (22). Our scoring method in effect reweights the votes so as to reflect different levels of metaknowledge associated with each possible answer. If the method works as claimed, the true answer should emerge as the winner, regardless of how many respondents endorse it.

The immediate comparable I could think of in the Indian context was the recent creation of Telangana out of Andhra Pradesh. I picked Warangal as a test case – I expected my peer group to be pretty evenly split in guessing which state Warangal was now part of and I was not disappointed. For good measure I added on 9 other cities of what I thought would be varying degrees of difficulty which were all answered by participants correctly 60% or higher on average.

Warangal on the other hand only 21 of my first 41 participants identified correctly as belonging to Telangana. Could Prelec & Seung’s method do better? Was it just guesswork or did the metaknowledge of those that answered correctly have some value?

Prelec & Seung advise the creation of a Bayesian Truth Serum (BTS) in order to identify expert subsets within each cohort. Higher BTS scores should suggest greater expertise and this should in turn result in more accurate results.

I find that the chart below provides significant validation to the concept.

The orange line tracks the average performance of a random subset of participants. For instance my first 5 participants (who were perhaps expert quizzers who got cracking without much prodding) all got the answer correct but those that followed have mixed performance taking the average down to just above 50% towards the end.

BTS Sorting allows us to capture the first 13 correct answers in order before a somewhat monotonic convergence towards the average which is commendable.

warangal

Here is the same method of analysis from Prelec & Seung

PrelecSeung.PNG

And finally the real MVP with highest average BTS score is Mansur Ahamed who incidentally got all answers correct except Warangal (in all fairness he was tied with someone who chose to remain anonymous by not signing in via on FB)

Advertisements

10 years since Scanner Darkly – CNN vs Rotoscoping

A_Scanner_Darkly_Poster

Scanner Darkly (the movie) directed by the brilliant Richard Linklater came out in July 2006. The movie has so many things to talk about including the war on drugs (and why it cannot be won), PKD’s own creativity being fuelled by his substance abuse, how scramble suits could be the way to end racism etc.

Instead, I’ve decided to pick form over substance and talk about the animation instead because of two stats:

  • it took 18 months to animate the movie in 2006
  • with Convolutional Neural Networks it might just take a day to do that by the end of 2016 (I exaggerate highly, but read on)

Back in 2006, I was amazed by the animation technique so I tried to read up on Interpolated Rotoscoping which was the technique used to create the movie. I didn’t get very far but here is a nice 4 min video about it and some quotes from in there:

we shot the actual film and we locked … and then there was a lengthy post-production process in this case was 18 months

the animation process which is so cumulative and so slow – hundreds of hours to do 1 minute

we thought it would take 350 man hours per minute, we were pretty off on that it took a lot longer

 

Fast forward to 2015-16 we have

1. CNN and this paper

2. This Torch implementation on github

3. Ostagram becoming a big thing overnight

4. Prisma

They also have plans for video, with Moiseenkov saying their processing technique can still work quickly enough for a mobile video scenario.

“Photos is only the start. We plan to add something like the Boomerang app from Instagram. Like short cycles. We plan to add them in the near future — I think in July. And some sort of very clever filters where the quality will be superb,” he adds.

So potentially by the end of 2016 that 18 months of work and ~100,000 man-hours of effort could effectively become a day or less of work for a powerful CNN – I’ll stop there and leave you to think about that.

2112 – why the album is fresh 40 years on

rushad“We don’t want to change what people think about rock & roll, we just want to show them what we think about it.” – Alex Lifeson, 1976

Today is the 40th anniversary of the release of Rush‘s iconic album 2112. 2112 was Rush’s fourth album and came out in 1976 after the modest success of Fly By Night and Caress of Steel in the previous year and changed everything for the band.

 

2112 in itself tells the tale of individuality being quelled by the establishment in a dystopian future some 50 years from now. It is very representative of the Cold War era and inspired by Ayn Rand.

A lot has been said about this, but today I don’t want to change what people think about 2112, I just want to talk about what it means to me.

In 2011, Sucker Punch came out and had a marvelous remixed soundtrack including a reinterpretation of Where is my Mind, the Pixies original of which capped off the glorious ending to Fight Club. When asked about his choice of soundtrack, the now much detested Zack Snyder says:

“If you go with the original song, you just get the moment. But if you go with covers you also get all of the baggage you bring to it. I like the baggage. It kind of resonates and rings across time, it’s not just of the moment.”

I loved Sucker Punch and its soundtrack and I loved it even more when Snyder told me why he made his choice.

Around the same time Ernest Cline came up with Ready Player One. RP1 has everything to keep you interested the Metaverse, MOOC, a young underdog protagonist, numerous videogame and pop-culture references and finally a beautiful homage to Rush that got my heart racing. The baggage was special.

The prescient Spielberg has bought the rights to RP1 and production is underway with a target release date of summer 2018. In all likelihood this will end up being some kind of 3D IMAX movie targeting young adults with strategic product placements and tie ups with Nintendo++ for all the gaming references.

But could it be more?

The Metaverse gets more real by the day and almost everything else that RP1 describes exists here and now. Forget the theaters and forget a tie-in video game (not everyone wants to play).

I am hoping instead for a world in 2018 where, wearing a VR headset, I will get a chance to emulate Parzival extracting that 1974 Gibson Les Paul-in-the-stone and playing Discovery. Wouldn’t that be something?!

Data Mining Hong Kong Property Prices

With property prices in HK finally starting to ease looks like it’s finally time to think about buying.

Fortunately, for a data junkie like me, a lot of websites, including banks like HSBC, provide detailed historic data on price trends and current bank valuations right down to individual floors and units for each apartment out there. However there don’t seem to be easily downloadable CSV files anywhere and getting hold of the data seems to involve a tonne of clicking and form filling.

After examining a few websites I found Home Price to have detailed and reliable data as well as a simple and elegant structure and well-suited to scraping.

Finally found some time this weekend to build a quick and dirty scraper (deploy at your own risk!).

Also some preliminary raw data for just one building.

Happy House Hunting!

IMPORTANT INFORMATION – I do not claim ownership of this data and I am not looking to share or profit from it – it is property of Home Price.

Why you should read John Brunner if you care about the future

I came across John Brunner a few years ago when I read Stand on Zanzibar . I love Brunner because his writing style adheres to some of my core beliefs of what sci-fi writing should be about. Here are three excerpts from The Shockwave Rider (1975) which is based in the early 21st century that illustrate why you should be reading his works.

FENCED BUT NOT FOILED

Inter alia the Handbook of the National Association of Players at the Game of Fencing states:
– The game may be played manually or electronically.
– The field shall consist of 101 parallel equidistant lines coded AA, AB, AC … BA, BB, BC … to EA (omitting the letter I), crossed at 90 ° by 71 parallel equidistant lines 01 to 71.
– The object is to enclose with triangles a greater number of coordinate points than the opponent…

Worldbuilding over Narrative – I am strongly of the belief that in sci-fi the context is more important than the storyline itself. A lot of excellent sci-fi has been created by taking contemporary stories and contextualizing them in imaginary worlds – a good example being The Stars by Destination, a futuristic retelling of The Count of Monte Cristo. The stock approach is to let context be implied from dialogues and monologues (diary entries being the most pedestrian of the lot) and it is pretty hard to build a lot of context quickly without boring the audience.

In Shockwave, Brunner introduces the fictional game of Fencing in a brutally efficient manner by simply including a Fencing Rulebook between chapters, abruptly and with no additional context. Contrast this with how the late Iain Banks labours through each contest in The Player of Games (loved the substance, hated the style)


“I’m a poor player myself; it would be a mismatch. But why did fencing appeal to you rather than, say, Go, or even chess?”
“Chess has been automated,” was the prompt reply. “How long is it since a world champion has done without computer assistance?”
“I see. Yes, I understand nobody has yet written a competent fencing program. Did you try it? You had adequate capacity.”
“Oh, using a program to play chess is work. Games are for fun. I guess I could have spoiled fencing, if I’d spent a year or two on the job. I didn’t want to.”

 

 

Technology but also how social values have been influenced by technological change – Brunner cares deeply about social problems of the future and how technology hasn’t done a good job of solving them.

Brunner having spent a few pages writing up the rules of Fencing, avoids the temptation of having a match-up between the protagonist and his interlocutor. Instead they talk about the value in creating an AI to “solve” it which is an important topic few people have addressed. Likewise that Brunner came up with Shockwave’s larger theme (I won’t tell you exactly what it is) 40 years ago gives me goosebumps in light of Apple vs the FBI!

Prescience – Brunner retains healthy realism in all his writing – he doesn’t make up overly fancy tech or aliens and he makes some pretty accurate predictions about the future. I find it fascinating that in Shockwave which is set in the early 21st Century correctly predicts AIs cracking chess and implies Go has been “solved” as well. Timely given how Lee Seedol has been faring vs AlphaGo.

 

By the time Reverend Lazarus fought his way through the maze of interlinked credit-appraisal computers and nailed the tapeworm that had just been hatched, he could well be ragged and starving.

 

Neologisms – Brunner does an excellent job of coming up with new words for things that don’t exist yet (almost none of them being portmanteaus – too contrived and easy). The bit above isn’t representative except it just happens to be perhaps the first time someone thought about self-replicating malware and decided this was an appropriate name for it.

I’ll end with this chapter-within-chapter from Stand on Zanzibar (written in 1968) that illustrates all of this again – worldbuilding over narrative, social impact of tech, prescience and neologisms –  doesn’t “acceleratube” remind you of something?

277311_10151823884210184_290648658_o

 

Why everyone should download Overdrive and switch to Audiobooks

If you’ve spent any time at all talking to me about books over the past year I’ve likely hijacked a few minutes of that conversation to expound the virtues of Overdrive (of which I cannot get enough).

Here is that sermon in print:

Audio over E: If you don’t check your phone while sitting in front of the TV or stay away from Whatsapp or Facebook while reading a multi-page article on your phone, well, congratulations! You can stop reading here and go back to what you were doing earlier. Otherwise, you’ve probably seen your reading habit suffer as a result of diminished attention span. I find that audiobooks provide the necessary sensory insulation to be able to absorb content without distraction. Further if you’re spending a material portion of your day in front of a screen anyway, it helps get your eyes and neck a rest once in a while, letting your ears do the heavy lifting.

Getting over the hump: I found getting used to audiobooks very challenging to begin with. I recommend easing yourself in, either by listening to something you’ve read already or by sticking to humour – I opted to do both and dug into Bill Bryson. Persevere through at least 5-6 hours of material before you decide to give up on audio. I promise you second wind.

The voice matters:  I tend to stay away from authors who read their own books – they don’t seem to realize that reading is as much an art as writing – the exceptions to this rule are stage and radio comedians who read their books (David Sedaris is an excellent example). Hachette and Randomhouse do an excellent job of picking pleasant and appropriate voices for their publications.

Fiction vs Non-fiction: Non-fiction is significantly easier than fiction as you can afford to drift away if you like and still not feel lost. Fiction, the kind I read at least, requires high attention to detail and is harder to follow on audio though I continue trying. Whodunnits are the worst and I stick to my kindle for them.

Add Ons: Pacing, Snoozing etc: Most audiobook apps allow you to modulate pace without affecting pitch – this is a life saver especially once you adjust to the format and want to kick the speed up a couple of notches. Almost all players also come with a snooze function that lets you play audio for a chosen period of time before switching off – keeps me from staring into my phone while in bed into the wee hours of the morning.

And finally..

Overdrive vs Audible: About a year back Lifehacker did a poll on audiobook services and Audible won hands down. I’ve tried Audible and I’ve found their service to be excellent. My beef with the ecosystem though is that audiobooks are notoriously expensive vs e-books and perhaps rightly so – it takes a lot more effort to create a good audiobook and the audience for audiobooks is markedly smaller for now. Even with the subscription model that gets me a free book each month and discounts, it doesn’t make sense for me to buy books on Audible with the same frivolousness as I do on Kindle. Here is where Overdrive comes in.

Overdrive is connected to over 30,000 libraries & schools – if you have a first world library membership (such as with the NLB ) that allows you access to their audiobook collection (apart from their e-book collection) so long as you’re a library member. The app seamlessly interfaces with your library’s digital collection allowing you to sign in with your library card credentials, search and download any book you like. Getting on to Overdrive has brought me back to reading a material amount of long form content each week after trying to desperately make it work for several years. For this reason I cannot recommend it enough.

Bonus: Finally the one complaint I have with the app is the inability to view the entire audiobook collection of your library in an endless scroll without having to hit next page each time. I wrote a simple crawler for NLB here which should spit out info into text file that you can parse into something like this.

Happy listening!

 

the anti-recommendation engine – 2

This post originally appeared on Facebook

As a first step to exposing myself to books outside my comfort zone I’ve built this basic tool – https://goo.gl/6m4h4r
This google sheet contains the nearly 10,000 audiobooks available at National Library of Singapore and randomly recommends 5 of them with helpful Overdrive download links.
The plan is to open this sheet whenever I’m on the lookout for a new book and force myself to download at least one of the 5, preferably something outside my comfort zone.
Let’s see how it goes!

the anti-recommendation engine

This post originally appeared on Facebook

Having gone through the pain of lugging many heavy boxes and setting up shelves across multiple home relocations over 2015 I’ve pretty much sworn off physical books and have become a vocal advocate for digital books (e and recently audio).
Further as a serial buyer/downloader of books on Kindle & Overdrive, of course, I have succumbed to the Recommendation Engine.
A clever feature, keenly absorbing my unique and, dare I say, eclectic tastes and customizing the virtual book-shelf to suit my interests. What better means to sate the ego? Why can’t Goodreads come up with a recommendation engine? And why can’t Wikipedia tell me what others reading this page also read? What about Reddit?
Then in late December, I happened to step into a quaint little book shop (did these things still exist?) It was meant to be a quick sojourn to take in the smell of fresh paper, feel a bit nostalgic about the Borders that once stood at Wheelock Place, and to smirk at “who-are-these-types-who-still-visit-book-shops” and more importantly “who-are-these-types-who-still-run-book-shops”
I picked up half a dozen books under the “you-false-prophet” glare of my wife and continued to wander around long after bills were paid until the impatient Mrs. had to nearly drag me out of there.
In order to understand what had happened I only need look at the books I purchased.
They were, as you might have guessed, well outside the comfy Recommendation Engine echo-chamber/cubby-hole that the Amazons of the world had created for me. Somewhere a Hidden Markov Model had slotted me into “Sci-Fi/Pop-Science/Self Improvement/Tech Start-ups” bucket and helpfully filtered out the rest of literature.
I think there is real value in stepping out of the Recommendation Engine zone. The answer is not in stepping back into book shops. The answer is in finding an Anti-Recommendation Engine or building one if it doesn’t already exist.
Not just one for books. Perhaps one for Facebook as well.

Runners vs Walkers

This was originally posted on Facebook on 10th February 2013

The organizers of the Green Power 50km hike were kind enough to publish a full rank list of finishers here:

http://goo.gl/4fbUw.

Two interesting observations:

a. One should expect marathon finish times end up being some kind of normal distribution with a long right tail. However GP participants fall largely into 2 distinct categories – those who walk the ups but run the flats and downs vs those who will walk the whole way. This led me to believe that there should be some kind of bi-modality in the distribution – I was not disappointed.

Histogram of finish times in hours (y-axis = number of people who made it)

RW_p1

 

Compare above with a sampling from a bimodal distribution – 50/50 mix of {N(6.75,1.25); N(10.25,1.5)}

RW_p2.PNG

b. There is an additional distortion – I was personally looking to finish under 10 hrs and I found myself adjusting pace towards the end to meet this target – does this work with everyone? If it did we should see more people finish closer to a 30 minute and hour mark as opposed to 15 and 45 minute marks. Here is a breakdown of finishers split into 15 min intra hr buckets revealing that this is indeed the case.

RW_p3.PNG