Home | About Me

Published

- 11 min read

How we got into deep learning

img of How we got into deep learning

AI—particularly in its LLM form—is what everyone is talking about these days (early 2025). As one of the founders of MarianaIQ, a company that pioneered deep learning applications in B2B marketing over a decade ago, I can’t help but feel a strange mix of emotions—pride, nostalgia, and maybe a little regret. It’s reassuring to see ideas we explored back then gain traction, but also a bit disheartening to realize how quickly innovation moves, leaving past efforts buried under the weight of newer breakthroughs. When I mention our early work, people nod politely, but a friend once put it bluntly: “You’re not a well-known researcher.” Fair enough. Still, there’s a part of me that wants to reclaim a small piece of the narrative—to remind myself (if no one else) that we were there, that we saw something before it became obvious. Maybe this blog is just a wistful attempt at asserting some kind of significance. Or maybe, it’s simply an attempt to share what we learned along the way.

Problem We Were Trying to Solve

We, Soumyadeb Mitra, Abhishek Kashyap, and I, started MarianaIQ in 2013. We were looking to become the next-gen B2B marketing platform. We wanted to help B2B marketers identify:

  • Which companies were showing buying intent and were the right fit.
  • Who within those companies were key decision-makers and influencers.
  • How to reach them effectively across various platforms. As companies such as 6sense and Lattice Engines were already focused on the companies side of things, we decided to focus on identifying the people within the companies to target.

The typical solution to identifying personas within a company to target is to use titles and look for the people using databases like Zoominfo. The problem with using titles is that often a title doesn’t say much. A director of IT might be responsible for infrastructure or applications or whatever else. Additionally, the seniority of titles doesn’t always translate across companies and industries (a VP at a bank is relatively a much lower person in the hierarchy than a VP in a tech company, for instance.). So, in addition to looking at their titles, we wanted to do what a good SDR or AE would do, i.e. look at their resumes/online profiles and look for their specific interests that could help with targeting.

Our approach was simple in concept: analyze a customer’s past successful sales, determine the types of people involved, and build a lookalike audience.

Initial Approach

To create a look-alike audience, we needed to be able to take in profiles of people our customers had sold to and identify other people in other target companies that were similar. While all of us have a sense of what we mean by similarity, as a concept it is tough to nail down. For instance lets consider books: Anna Karenina is similar to Madame Bovary in terms of the subject matter, psychological depth, and social constraints. One could also argue that Anna Karenina is similar to The Catcher in the Rye as both are typically prescribed reading in high schools (BTW, this is not an idle example from me. I remember reading about this back in late 90s or early 2000s about how Amazon took away its editors who used to come up with similar lists and used data to do this and this is one of the examples I remember. Though I might be wrong with the specific example.).

So, what we needed to do was to be able to characterize a profile of a person in a way that is relevant to what we were looking for, i.e. the function and role they performed with in the company. To do so, we wanted to characterize their profiles based on the keywords in it. A typical approach to doing so at that time (in 2013), was to use methods such as Latent Semantic Analysis or Latent Dirichlet Allocation. We would take the profiles from the training set and then use LDA to discover a set of topics. A topic is a set of keywords that occur together across profiles. Two profiles can be said to be similar if they are similar in terms of the topics.

Problems with the Initial Approach

We quickly ran into some issues.

  • For any given customer, the set of keywords that characterized the topic didn’t seem to make intuitive sense. Often, we would find some random keyword like swimming show up as a keyword for a topic which would have words around IT for instance. This required fine tuning of topic set with some domain knowledge. (I suspect this would have been lesser of an issue had we had enough data for any given customer, say in 10s of thousands)
  • LDA did not address the issue of same words having multiple meanings. The words’ meaning is typically more obvious in context but as a keyword in a topic it was less so.
  • As the topic set was unique per customer, we needed to evaluate all the relevant profiles for each customer of ours (we needed to look at some 10-40M profiles each time). This was computationally intensive.
  • We needed to build a model for each customer. As we wanted our product to be a SaaS product with minimal setup, this was a deal breaker. So, we needed a method that could characterize profiles without much customer-specific knowledge much like a human would do.

Deep Learning

This was soon after the successes that deep learning had seen in the ImageNet challenge. And representations learned in ImageNet were being used for other tasks and were seeing great success. Additionally, we were reading about how in face recognition, deep learning was making huge strides. Prior to deep learning, you had to build models specifically for each use case. Say for facial recognition, you might look at the space between the eyes, length of nose, breadth of forehead etc. However, those set of features might not work for some other use case. Say facial recognition not from straight ahead but from a profile. However, with deep learning, you didn’t need to do that. This felt similar to the situation we were facing.

At that time, there was a course on Coursera on Deep Learning by Geoff Hinton. In one of the lectures, he introduced a toy problem (of determining familial relationships, if my memory is correct) and used it to show the power of representations learned by the network during that process. This brought learned representations or embeddings to our attention.

Enter Word2Vec

Setting aside the question of whether word2vec is truly deep learning, it would be fair to say that it changed the world for us. Word2vec (from Google) is a method that produces word embeddings. It used a rolling context window and predicted the current word from the words in context or used the current word to predict the words in the context. The order of words didn’t matter as it was using a bag of words as opposed to a sequence.

Embeddings thus derived were shown to have interesting properties. The embeddings, characterized as a vector, captured semantic and syntactic properties and could be manipulated like vectors. There is the famous example of taking the vector for king taking out the vector for man and adding that for woman resulting a vector for queen. It is as if the various dimensions of the vector captured properties about the word and its real world context.

With this, we could then characterize the profiles that we had by simply adding up and normalizing the word embeddings for all the keywords in the profile. We used tools like Gensim to generate our own embeddings based on the corpus of profiles that we had. And we could compute the profiles before hand determining similar profiles was as simple as getting other profiles that were close in the vector space (cosine similarity).

Seq2BOW

While the above characterization of profiles worked, we felt that we were not using some of the information that existed in profiles, i.e. job titles. So, we developed seq2bow. In this method, we used another deep learning technique called LSTM to process the sequence of words in the title and had it predict the keywords in the profiles. This gave us a way to compute titles that were in the same semantic space as the keywords. It captured various characteristics that were important to us: seniority, what they actually did, corrected for spelling mistakes (there were 22K profiles where the word engineer was spelled enginer), and more that I can’t remember any more.

So, we used the title vectors from the seq2bow model and the sum of the vectors from keywords in the profile to characterize a profile. We stored all the profiles with those two vectors concatenated. We then used ANNOY to automatically retrieve other profiles that were similar to the profiles we had for training.

This method of storing objects using vectors and then retrieving them through a similarity search is what underlies all vector databases. Towards the end of the MarianaIQ journey, during the process of the sale of the company (in early 2018), we looked to pivot into doing this. I could not come up with too many use cases for such a vector store and retrieval product. (One use case we did come up with was to search for similar dresses for e-commerce but that was the only one). I didn’t foresee the developments that have come since then and didn’t foresee the rise of vector databases.

Current LLMs, of course, obviate the need for this level of domain-specific modeling. Soumaydeb fed a bunch of job titles into one of the LLMs out there. They classified them into various categories without any training of any kind. This shows the power of training on larger sets and how that scaling can overcome any domain-specific challenges.

Account Model

The success that we saw in creating the persona models lead us to think if we could use this technique for our account model as well. Recall that at the beginning of our journey, we had focused on persona models (i.e. to identify the individuals within a company). We did create account models to predict who is likely to buy but those used more typical classification models such as boosted trees etc. And, as you might, suspect we needed to create those per customer. So, similar to our persona models, we asked ourselves if we could somehow characterize a company (or account) and then use similarity to identify look-alike audiences.

We were building account models based on hand selected features. Typical features include: size, location(s), industry or sub-industry, revenues. These were not hard to use in our models. And its usage was not hard.

However, one feature that we knew was important was much harder to use in the model and that was other software used. We were mostly serving high-tech customers. And much like consumer products, a company using a given software is likely to use others, either because they are complementary or indicates a certain posture to use tools etc. In our corpus for companies, there were a few 10s of thousands of software that were being tracked. So, picking which software to use in the model to predict who is likely to buy required deep domain knowledge.

Taking a lesson from our and others learning in this space, we looked to create company embeddings. The way we created that was to use auto-encoders. An autoencoder is a type of neural network that learns to compress data into a lower-dimensional representation (encoding) and then reconstruct it back to the original form. As we were doing for all the companies in our database, we had enough data to do the training on. This helped take a very high dimension space of company features and reduced it to a very dense lower dimension space. We could then automatically create prediction models without any or much human input.

These representations (augmented with vectors of keywords of all the employees in the company) produced amazing results in terms of their predictive value. We didn’t get to prove their value as much as we would have liked as we ended up selling the company.

The big lesson for me with the success of this model is this: if you have enough data (don’t ask me what size qualifies as enough) then deep learning gives you a more robust and less model-dependent approach. I think this could generalize to things like customer journey on a website etc.

Closing Thoughts: Beyond LLMs

Looking back, it’s fascinating how our early deep learning experiments foreshadowed today’s AI landscape. Embeddings and representations from one problem being applied to another, vector-based object indexing and retrieval, etc. were central to our approach—concepts now powering everything from vector databases to retrieval-augmented generation (RAG) for LLMs.

While LLMs dominate today’s AI discussions, deep learning’s real power lies in its ability to structure unstructured data and uncover hidden patterns. The techniques we used for persona and account modeling could be applied to customer journeys, financial risk assessments, and countless other domains.

As AI continues to evolve, I wonder: What crucial insights are we overlooking today that will seem obvious a decade from now?

Thanks to Soumaydeb Mitra, Abhishek Kashyap, Justin Driemyer, Solomon Fung, Arunim Samat