Skip to content

Author: Ryan

DevRel @ Databricks: Year 1

I joined Databricks in October 2019. My first day was also the first day of the Spark + AI Summit in Amsterdam – a heck of an exciting introduction to a new team, new company and new community.

Why Databricks? I was very happy at Neo4j and certainly loved working with the Neo4j community. Databricks brought with it an exciting opportunity though – build a talented team at one of the fastest growing cloud data startups in history. I also have the opportunity to work directly with the founders and senior leadership at the company who understand the value of developer relations and the importance of building a great community of data scientists, data engineers and data analysts.

Our Team

Databricks DevRel Team

The team is now 6 located in San Francisco CA, Seattle WA, Boulder CO, Sante Fe NM, and Blacksburg VA. We have Developer Advocates and Program Managers all working together to grow awareness and adoption of Databricks and the open source projects which we support.

First-Year Team Accomplishments

Here’s some of our accomplishments from the first-year, working along with amazing collaborators across the company and community.

Keep in mind: a lot of work our team has done in 2020 hasn’t yet been released – stay tuned! And, of course, this is all within the context of the broader accomplishments of the company and community in 2020.

Looking Forward

We have a really excited 2021 planned as the team continues many of the initiatives above and takes on new challenges. We’ll be focused on making it easier to learn data science, data engineering and data analytics, as well as making it simple to apply these learnings using Databricks. An important part of this mission will be growing and strengthening the community so we can all learn from each other.

Are you a data geek and want to join the adventure? We have data engineering/analytics advocate roles and developer (online) experience advocate roles open in the US as well as a regional advocate role in Europe. Reach out to me at (firstname).(lastname)@databricks.com if you want to learn more!

Moving RDBMS data into a Graph Database

One of the most common questions we get at Neo4j is how to move from a SQL database to a Graph Database like Neo4j. The previous solution for accomplishing this was to export the SQL tables into CSV files and then importing the CSV files with neo4j-import or LOAD CSV. There’s a much better way: JDBC!

Neo4j JDBC Support

There are two distinct ways you can use JDBC within Neo4j:

  1. Access Neo4j Data via JDBC. Do you have existing code that accesses your SQL database using JDBC, and you want to move that code to access Neo4j instead? Neo4j has a JDBC Driver. Just update your code to use the awesome power of the Cypher query language instead of SQL, and switch over the JDBC driver you’re using, and you’re off to the races!
  2. Import SQL Databases into Neo4j. Do you have data in your SQL database that you want to move into a Graph? The APOC library for Neo4j has a set of procedures in apoc.load.jdbc to make this simple. This blog post will cover this use case.

Loading Sample Northwind SQL tables into MySQL

In order to run the code snippets in the following sections, you’ll need to have the Northwind SQL tables in a MySQL database accessible from your Neo4j server. I’ve published a GitHub Gist of the SQL script which you can execute in MySQL Workbench or using the command-line client.

In order to run this, I created a blank MySQL database in Docker:

Loading data from RDBMS into Neo4j using JDBC

With the APOC JDBC support, you can load data from any type of database which supports JDBC. In this post, we’ll talk about moving data from a MySQL database to Neo4j, but you can apply this concept to any other type of database: PostgreSQL, Oracle, Hive, etc. You can use it for other NoSQL databases too, but APOC has direct support for MongoDB, Couchbase and more.

1. Install APOC and JDBC Driver into Neo4j plugins directory

Note: This step is not necessary if you’re using the Neo4j Sandbox and MySQL or PostgreSQL. Each Sandbox comes with APOC and the JDBC drivers for these database systems.

All JAR files placed in the Neo4j plugins directory are made available for use by Neo4j. We need to copy the APOC library and JDBC drivers into this directory.

First, download APOC. Be sure to grab the download that is for your version of Neo4j.

Next, download the JDBC driver. Then, copy the file into your plugins directory:

Finally, restart Neo4j on your system.

2. Register the JDBC Driver with APOC

Open up the Neo4j Browser web interface:
2017-03-13_13-53-55

In the Neo4j Browser, enter a Cypher statement to load the required JDBC driver:

3. Start pulling Northwind SQL tables into Neo4j with JDBC and Cypher

Run the following Cypher queries, courtesy of William Lyon, separately in the Neo4j Browser:

Running Cypher Queries on Imported Data

Here’s a simple Cypher query for collaborative filtering product recommendations:

Results:
2017-03-13_14-46-29

Next Steps

If this was your first experience with Neo4j, you probably want to learn more about Neo4j’s Cypher query language. Neo4j has some great (free) online training you can take to learn more. You can also use the Cypher Refcard to power your journey to becoming a Graphista.

Graphing Hillary Clinton’s E-mails in Neo4j

Technologies: Neo4j, OpenRefine, Prismatic Topics API, Python, Py2neo

tn1oy

Bernie is sick and tired of hearing about Hillary’s e-mails and so am I.  So, why am I writing about them?  Well, they can possibly provide an interesting insight into how our government works (or doesn’t work) — if only they were in a better format than PDFs!!  They represent a perfect graph!

I started off by downloading the CSV files created by Ben Hammer.  Some of the information about who messages were from/to aren’t very normalized in that dataset, so I used the OpenRefine faceting feature and created emails-refined.csv.

I imported these into Neo4j:

With the data in Neo4j, I got to explore the Person nodes Hillary sent the most Email nodes to.

hillary_emails_to

Knowing the e-mails and senders+receivers is interesting, but I wanted to see what the e-mails are about!  While the subject lines are included with the e-mails, they’re often opaque, like the meaningful subject “HEY” used in an e-mail from Jake Sullivan to Hillary Clinton.  Natural language processing to the rescue!

I built a small Python script and used Py2neo to query all e-mails without attached topics.  I then go through each e-mail and send the raw body text and subject to the Prismatic Topics API.  The API returns a set of topics, which I then use to create REFERENCES relationships between the e-mails and topics.  This code is based on the excellent post on the topic by Mark Needham.

Now I can explore e-mails by topic, like the graph below where I see e-mails related to David Cameron.  When I double-clicked on the e-mail with subject ‘GUARDIAN’ in the Neo4j Browser, I can see all the other topics that e-mail references, including Sin Fein, Northern Ireland, Ireland, and Peace.

david_cameron

With this additional topic information, I can start to understand more context around Hillary’s e-mails.

What fun things can you find in her e-mails?

I’ve opened up the Neo4j instance with this data for the world to explore.  Check it out at http://ec2-54-209-65-47.compute-1.amazonaws.com:7474/browser/.  The dataset is open to the public, but I’ve marked it as read-only.  Mention me on Twitter with @ryguyrg if you discover any interesting nuggets of knowledge in Hillary’s e-mails!

 

Developer Evangelism → Developer Relations

  • LinkedIn Profile Demonstrating Title shift From Developer Evangelism to Developer RelationsI do not preach. I educate.
  • I do not pretend to have all the answers. Only some, and pointers to places for others.
  • You teach me as much as I teach you.  Both are critical.

When I first talked with the team at Neo4j about career opportunities, it was over a year ago.  The role was VP of marketing.  When I discussed this with friends and former colleagues, they were confused and thinking “but you’re an engineer.”  I wasn’t confused.  I’m passionate that Developer Relations needs to hold many hats – engineer, product manager, marketer, community organizer, spokesperson, press relations and e-mail answerer.   Importantly, a hat it shouldn’t wear is that of a preacher.

I renewed discussions with the Neo4j team back in November without a specific role in mind. I enjoy making companies and technologies successful, and will do whatever is necessary to make that happen.  We ended up agreeing on the role of Head of Developer Evangelism, North America.  Well, sorta.

The industry has used the term “evangelism” for many years– I believe starting with Apple and Microsoft. At Google, we decided that the organization was about 2-way feedback and discussion, not about preaching.  It was just as important to get feedback from developers about our products as it was to make developers aware of the products and their capabilities.  The religious connotation with the term also offended and confused some. We needed a term which more closely aligned with the goals of the organization.  We found it with Developer Relations as an organization, and Developer Advocate as a title.

I’ve now shown the light* to the folks at Neo4j and our team has decided to re-brand as Developer Relations. I’m delighted to be updating my social media profiles.  Luckily I didn’t yet buy business cards!

* pun intended

Blogging: not my first rodeo

My first “blog” was created on March 28th, 2001. It was a private and incredibly emo Livejournal blog. My last post there was in 2006. In 2007, I joined twitter and stuck to short-form content until Google+ launched in 2011.

Most of the long-form content I’ve written for the web was for company-owned blogs when I was in Developer Relations at Google. There was the Google Data APIs Blog, the Google Apps Developers Blog, the Google Code Blog, the Google Developers Blog, and the YouTube API Blog. I even had my byline on one post for the Official Google Blog.

And then, given my career in Developer Relations, there were countless posts I drafted or edited which had the names of others in the byline.

Now is time to write for myself.  In a way, this really does make this blog my first rodeo, but I trust it’ll be a pleasant one.

I will write my thoughts on Developer Relations, technology+engineering, marketing, management and general business.  As always, I encourage your thoughts and conversation.