Writings Yixin Lin

FBI/IRS scam

TLDR/PSA: there’s a well-executed scam involving phone number fraud. Caller IDs are not a safe method of verifying someone’s identity (even if you find their number on the official website!).

Just a quick recap of a scam attempted on me today, which I thought was interesting and well done. I didn’t realize the lack of security around phone numbers and how easily they’re spoofed.

Today around noon, I received a phone call with the Caller ID “Federal Bureau of Investigation” (number: 302-658-4391, the correct number for the FBI office in Wilmington, Delaware). An “agent” claimed that I (they specifically said Yixin Lin from Duke University) had an ongoing case against me for failing to turn in Form 8863 (an educational credit IRS form), and the tax plus penalties totaled $3950; I had a choice between resolving this or being arrested/having the case being pressed against me. They were very insistent on me staying on the line (“hanging up will be seen as an attempt to evade the FBI”), and I was supposed to go acquire some official forms from a nearby store that will be shipped to the IRS in order to resolve this matter.

I checked the caller ID, which was certainly linked to the FBI (appeared on a Department of Justice website, for instance, and later found out was the actual number for the FBI office). Furthermore, another person also called me demanding the same thing but claiming to be local police– but they called from an emergency line which was unblockable according to my phone. Both of these pointed to legitimacy, as I didn’t realize they could be spoofed (especially the emergency number which overrides blocking). On the other hand, this seemed like a very strange way to pay back the IRS. Also, every “agent” I was forwarded to had a slight Middle Eastern or South Asian accent (in the moment, I tried to be liberal minded about the possibility of FBI hiring practices, but I should have just taken the hint).

Interestingly enough, they directed me to Target to obtain these forms; once there, they told me to buy $3950 in Target gift cards. At that point I realized the whole thing was too ridiculous even if they did have legitimate phone numbers, demanded some proof which clearly wasn’t forthcoming (they kept referring back to the caller ID information, which is apparently their competitive advantage as far as scams go), and they hanged up. I was gullible enough to Uber to Target (kind of hilarious in hindsight!), but learned something new about spoofing phone numbers), and now hopefully you won’t fall for it.

Machine learning and humans

I’ve been thinking about what we can learn from deep learning about general intelligence and specifically connections with human intelligence, which has especially been on my mind after watching some of the lectures by Yoshua Bengio (1) and Yann Lecun (1, 2) on the foundations of deep learning. Because fundamentally “deep learning”, for all it’s been hyped by the media, is a pretty simple concept: get more layers. Why does it work well? Is it just another trendy topic in this field, like SVMs were in the early 2000s? Or is it something that actually leads us to a better understanding of general intelligence?

An popular answer to this question, which is often touted in popular media, is the universal approximation theorem1. It certainly sounds pretty fundamental: with some loose caveats, you can approximate any function, which certainly seems impressive. This actually doesn’t even say anything about why deep learning works: it’s a theorem about the shallowest possible network: loosely, the theorem says that a single hidden layer neural network with a finite number of neurons can approximate arbitrary continuous functions. Sure, but that doesn’t actually say anything about the algorithmic learnability: if, in order to approximate f(x), you need a neuron for every real number x, you’re no better than a lookup table. And it doesn’t say anything about why depth helps at all.

So does deep learning actually get us closer to general intelligence?

The famous no free lunch theorem says that “for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method” (emphasis mine). So in order for any machine learning algorithm to perform better than another one, you need to exploit some sort of regular structure that occurs in natural problems.

For example, we can tell a typical image from the world somehow doesn’t look like an image of random noise (see below); in the same way, the set of natural problems that we face are large but have some structure that makes it much narrower than the set of all possible problems. If deep learning is doing so much better empirically than other algorithms, it needs to exploit this natural structure; this means, in a sense, it’s biased towards these problems, and would do worse on problems without this structure.

There’s two things that actually make deep architectures surprisingly better than more “classical” machine learning models, and (in my opinion) is a step on the path to general intelligence: they do representation learning, which means that you don’t handcraft the features but instead let the algorithm discover its own representation from the raw data, and they’re hierarchical models, which means that they form higher-level abstractions from the details.

It’s pretty crucial for humans that internal representations of things are learned and not handcrafted, because this gives us the generality part of “general intelligence”. There was no concept of a keyboard in the ancestral environment, but that doesn’t stop people from learning to type when the typewriter was invented; there was no touchscreen keyboard twenty years ago, but that didn’t stop millenials from learning to text when the iPhone was invented.

There is definitely something fundamentally important about learning the features from raw data for general intelligence. After all, the same raw data means entirely different things depending on the context; there’s no way a fixed mapping from raw data to an algorithm can be part of a learning algorithm. Dependency on a human-constructed mapping from representation to reality is one of the reasons for the brittleness of purely symbolic AI. Representation learning is a necessary (if not sufficient) piece of generality.

Hierarchical modeling also seems to capture an important aspect of general intelligence. It has an obvious analogue in the study of human expertise, which is the concept of chunking in psychology. Human brains can only manipulate “seven plus-or-minus two” objects in their working memory, so the thing that separates experts (chessmasters, memory competitors, tennis players, pianists, etc.) is not some superhuman ability to notice everything and act upon that information, but a concise chunking of the relevant information and an ability to manipulate these chunks in meaningful ways. You can see this pretty obviously when you learn to drive a car: at first, you’re consciously thinking about how much to turn, which windows to notice, and which pedals to press, but you gradually become familiar with these lower-level operations until eventually you operate at the level of changing lanes and choosing routes. The fact that human brains work intelligently at all, despite this tiny “amount” of working memory, is a reflection that the highest level of abstraction encodes the most important information from the lower levels really, really well.

Yoshua Bengio cites the no free lunch theorem as evidence that we need to have a strong prior belief about what the world looks like– to encode some of the structure in the world– and points to hierarchical models (what he calls “compositionality” of the world) as a reason why deep architectures work well. But even though they work well, the best machine learning algorithms, compared to human brains, generalize extremely slowly. For example, humans require just a few examples of objects to generalize well, as opposed to the gigabytes required by state-of-the-art systems that still only approach human performance. In other words, humans are extremely efficient with data2.

Connection to humans

If we take this (very vague and hand-wavy) hierarchical, feature-learning metaphor for human intelligence seriously, then there’s some things we can say about what it means to think differently from other people, and what it means to communicate different thoughts.

Somewhat uncontroversially, an expert understands her subject in a different way than a novice, but it’s not just knowing more: it’s knowing differently. Given the same raw data, the features that an expert extracts from raw data (sensory inputs like vision, touch, sound), features that hierarchically build into one of very few chunks manipulated at a conscious level. This is why chess grandmasters often consider no more moves than amateurs (they just subconsciously choose which moves to consider better), and where the whole concept of intuition (or unconscious, nearly indescribable knowledge) can surface. In The Art of Learning by Josh Waitzkin, he describes the experience of learning numbers to learn numbers, or form to leave form: low-level information first becomes conscious knowledge, which is repeated enough to become unconscious knowledge, which builds a strong foundation for higher-level knowledge to be learned in the same way. This is oddly reminiscent of the hierarchical feature-learning in deep learning.

This means that there’s no one-to-one mapping between concepts in two different people’s brains. As an aside, matrix-style knowledge transfer (“downloading kungfu”) is likely not so straightforward even if you had emulations, because it’s unlikely that knowledge is additive and compactly described in the way that data on a hard drive is. On the other hand, I think there’s likely a significant amount of interesting research to be done in transfer learning.

But what does this mean for arguments and communication? What does it mean for two people to communicate?

For a long time, it seemed very strange to me that two people cannot eventually agree. After all, there’s some sense that two people who persistently discuss will eventually agree, right? There’s even a formal statement of this: Aumann’s agreement theorem.

Aumann’s agreement theorem: Two people acting rationally (in a certain precise sense) and with common knowledge of each other’s beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal.

And yet it is often the case that not only two humans find it impossible to agree, but that they find each others’ positions completely incomprehensible. The easy solution, of course, is that the priors aren’t common: that they disagree on some important facts which lead them to reach different conclusions, even as they agree on the rules of logic. But I think it goes deeper than that: not only are the “facts” different, but their internal representations of the world are different. Not only do they disagree on which facts are right or wrong, but they don’t even share the same feature-extraction mechanism: the way they see the world, the high-level representation that they extract from the raw data, tuned over their entire lives, creates “chunks” that operate with completely different dynamics. So when I say the word “liberal”, or “God”, or “capitalism”, entirely different constructs are conjured up in different people’s minds.

On the other hand, when we discuss very specific things, we can communicate incredibly quickly– almost instantaneously. For example, how is it that we can communicate enormously complex emotions to each other if our internal representations are so unsynchronized? Read a sappy poem, watch a horror film, and somehow incredibly large amounts of information are transmitted instantaneously. The nuances of emotion must be enormously complex, if we measure them by something like Kolmogorov complexity (i.e. described by the smallest possible computer program the reproduces the concept). How is it possible that it takes months to communicate a relatively low-complexity concept like calculus, but communicate emotion almost instantaneously?

It’s because they don’t really need to communicate the concept at all: basically, compression. That these specific representations are highly synchronized. Humans share an enormous amount of prior between them, especially in terms of their machinery of cognition, if not the resulting representations that the machinery produces. Evolution placed a highly-complex black-box emotion machine in our brains, and if someone else wants to communicate one complex internal state of that machine (like happiness or love), they just tell you which lever to pull and that state is automatically generated within yourself. Empathy works because we share so much prior information on these matters– and it’s why we find these emotions “instinctive”, almost subconscious: we can literally feel other people’s emotions by leveraging our shared black-box machine. In some sense, poetry is just an extremely lossy compression format for complex, specific emotions.

The field of AI is useful for many reasons, both practical and philosophical, but it’s also fascinating from a humanities point of view. It shears away our anthropomorphic tendencies and exposes what, in some sense, is really “hard” or “interesting” about humans. I believe that as we get closer to a theoretic understanding of the phenomenon of intelligence, we’ll gather more insights on human thought and human nature.


  1. Here is a great intuitive introduction to this theorem. 

  2. Human data efficiency is probably due to better algorithms used by the brain, but it’s probably also caused by a strong prior evolved into us. I think it’s therefore really interesting to explore what the human brains’ priors are. An obvious example is the capability of language and vision. I think it’s especially interesting that our “basest” or most instinctive desires– things like hunger, thirst, and sexual desire– actually operate on a very high level of abstraction.] 


What does insight look like?

When people think of controversial, groundbreaking ideas, when people shout “new” and “fresh”, they think of opposites. “We believed this way, but the truth is the opposite.”

But correct is only exactly the opposite of incorrect when you’re in one dimension. If you can reduce the situation so that there is a single number that you’re trying to maximize, like ELO in competitions, then more is better than less. However, situations are rarely that simple (or more precisely, reducible): in any dimension greater than one (in which compression to a single dimension is too lossy), the opposite of the incorrect is almost always… also incorrect.

The way to think well or insightfully– is not (just) to be contrarian; by doing so, you are still defining yourself by incorrectness, albeit by negation. Insight doesn’t feel like a grand crusade against the antithesis of truth, it feels like being less confused.

Reading After Credentials made me realize how different actual insight felt like.

The essay didn’t feel like an attack on the old institutions, of a fierce individualist defying the traditions by proclaiming “you don’t need college!” It was a simple explanation of the circumstances in which these institutions were created, an evolutionary niche that had to be filled due to inefficiencies in predicting performance (and one that itself had the flavor of fierce defiance of the previous norm, which was widespread nepotism). Then it simply explained why those forces are fading away (efficiency gains), again in the matter-of-fact tone that made it obvious that progress was being made.

Things simply made more sense this way. Unconnected dots now had lines (even arrows) between them: for example, the prevalence of hagwons, the rise and fall of enormous corporations, and the high school experience. To put it more bluntly, enormously complex packages (human motivations and organizations) were reduced to simpler forces.

This is not abstract gratification, either. In this case, if you understand that credentials only exist as a proxy for performance, you’ll understand the actual causal structure in the situation. Credentials have meaning to the extent that they signify performance– that’s why prizes at competitions mean something, why prestige even has a definition. That they exist at all is a compromise, an easily compared but imperfect hashcode or summary.

Insight is not any of the bullshit that seems to be correlated to it, like novelty, or being anti-establishment, or freshness. Insight is just getting closer to the truth.

Norwegian Wood

I always forget how enjoyable reading is.

Wiping away the structure of your own life, forgetting everything that seemed important or relevant, hot-swapping in a whole mind… I’ve read once before that language is a form of mind control, since your brain simply can’t help but process words. (Try staring at these words without reading them, for instance).

A book is the perfect virtual reality setup, a contraption that replaces more than the senses; it injects thoughts into your brain through the sort of unadulterated, frictionless channel that film and other media can only jealously fantasize about.

In other news, Murakami is one sex-crazed dude.


If you’re not growing, you’re dying.

I always thought this was incorrect. Besides being a generalization that could only approximate truth, I also didn’t know where to apply it. Knowing this pithy wisdom didn’t actually make me do anything differently, which is what you would want pithy wisdom to do.

But I’ve consistently tried to construct optimal scenarios, situations where you could milk it for all it’s got. Hill-climbing with hilltop in view, searching for positions of strength, for unassailable high ground. It’s a pretty good low-level tactic, and it also feels good in a specific way. I never rooted for the underdog, because I wanted to deify some Hercules. Worship came easy.

Whenever I got there, though, the ground felt shaky. Maintaining optimums is like obsessively looking for cracks in your armor. The game was to avoid snatching defeat from the jaws of victory– and it’s impossible to win, just lose less. Wasn’t that what being humble meant: to know where to improve, even when you’re winning?

And yet winning didn’t feel that good, and losing felt like taking steps back. More importantly, the climb itself always felt ridiculous. Either you’re reaching for heights you’ve already been to, or the higher altitudes seem as arbitrary as the current one. Just another mark on the number line. (Isn’t that humility, to know that greater heights are possible?)

I think the funnest times are when you’re growing. When you’re a rising star, you’re asking how far you can go. When you’re king, you ask how can this last?

When you look forward to something, you know that there’s something better, and you think it’s going to get here. You don’t even notice the grime because you know you’re getting out of this shithole.

When you think you’ve arrived there, you’re a little nervous it’ll go away. You’re scared there’s something better you don’t have. And you’re absolutely frightened that there’s nothing better.

Economies embody this. The hedonic treadmill perverts this. Young people may sometimes sense it, but old people know it.

Look beyond optimums. You’ll probably have more fun.


I wrote this a while ago.

Life is easy. Life is simple.

It is YOU who complicates.

In your effort to be deep and worldly, to have your intellectual fingers in every pie, to prop up your self-importance and look for a sense of progress, you lose any steam whatsoever. Brownian motion. You think your ideas clever, your methods sophisticated, your execution close-to-perfect.

Intellectual masturbation.

Nearly nobody gets the fundamentals right. Because they’re seen as fundamentals, they’re repeated until they pass into that realm of things you “know.” People approach the fundamentals as obvious things, which becomes the things they can’t learn. And then they marvel at the superhumans around them, wondering about a magic spark or a hidden secret.

Stop. Just stop.

It (i.e. life) is easy. The fundamentals are easy. Which makes you think you can do it without trying, which means you don’t do it well, which means you wallow about… building ever-towering precipices of “knowledge” and “experience.”

The dangerous thing is not that you fail. You won’t fail. You’ll achieve a modicum of success. You’ll think your ideas are true.


Eat well. Sleep well. Exercise.

Pick goals. Do everything to achieve them. When you fail, figure out your mistakes and never make them ever again.

Don’t lie to yourself. Know what you know, and know what you don’t know.

Focus is paramount.

Don’t stop.

Everybody knows this, understands this, forgets this.

The way to actually USE this is to drop literally everything until you’re absolutely sure you’re doing it.

Sit down every single night and record any deviation from perfectly executing these simple ideas. Don’t lie to yourself or justify. Be machinelike. Then wake up and try to do it perfectly this time.

That is the difference between knowing and knowing. This is why little kids can play hard music and Yudkowsky is fat. This is why there are people better than you at the things you care about.

Stop thinking so hard. Life is simple.