Playing with Instagram Data

Downloading and visualizing Instagram direct conversations

Sharvil
DataDrivenInvestor
Published in
4 min readMay 30, 2019

--

Photo by Erik Lucatero on Unsplash

With Instagram not providing a search mechanism for its chats, it becomes a really grueling task to look for older conversations which have traveled to the very start — because the only potential way you could accomplish that is by scrolling through messages all the way to top and the loading pauses in between every few texts worsens it even more!

So, in the hunt of finding a way out, for a less arduous alternative to such tiresome scrolling, I came to know that Instagram has finally rolled out a feature which enables users to request and download their Instagram data. The steps for requesting a download are:

  • Log in to Instagram.com and visit the Privacy and Security section.
  • Hit “Request Download” under the Data Download section.
  • Enter your Instagram credentials to confirm the request.

They’ll send a mail containing all your Instagram data within 48 hours on your registered email address, but chances are you’ll receive the mail much sooner, in less than an hour. After extracting the compressed email attachment, you’ll find a “messages.json” file which holds all your Instagram chats/direct messages you exchanged with other users.

Once, you have the data at your disposal, we can finally start playing with the data we have at hand. The rest of the article will mainly focus on visualizing and analyzing the direct messages shared between two users (not group chats), let’s call them John and Jane. All the code and images used in this article can be found at Github.

Loading and understanding the structure of data

The data is a JSON array, with each element of array comprising of two keys, namely, participants, which in itself is a pair of users involved in that direct message, and conversation, which stores all the details of that conversation like message text, sender, timestamp, media, etc

The timezone of ‘created_at’ is changed to local timezone for ensuring clarity and alignment in further analysis. The timestamp of messages is then set as the index of the dataframe for utilizing efficient calculations. Meanings of some of the columns in the resulting dataframe are:
sender — username of person who sent the message
text — the text of message sent
created_at — timestamp of message
media_share_caption — caption of shared media posts
likes — shared post likes, message likes, story likes

Exploratory Data Analysis

Now that our data is ready to be queried upon, we’ll first start by visualizing the count of messages sent by each user. And then let’s try to capture how the exchange of messages has increased overtime by re-sampling the data on a monthly basis.

Media Share Statistics

Now, first let’s do a simple task of determining the number of posts, stories, animated gifs shared by each participant of the conversation and depict how their tallies add up.

After that, as we’ve got hands on all the captions of the Instagram posts shared in the attribute media_share_caption, let’s dig a little bit more and extract all the hashtags from those captions in an attempt to gain insights on the type of posts/media shared most frequently.

Hourly Distribution of chats

Lastly, let’s deep dive into how the chat messages are spread out throughout the day by plotting a descriptive pie chart which will clearly demonstrate the rate at which messages are transferred during twenty-four hours.

In this article, we merely explored the direct messages data, there’s a lot more scope of visualizing and figuring out hidden patterns in other data offered by Instagram which you all can definitely try out!

Let me know your views if you enjoyed the article!

Until then, happy visualizing!

--

--