11 min read

Explore the Instagram API based on the cast of "Severance"

A preliminary note

This post largely relies on package for the R programming language I’ve been working on, that is still under development. Its name is cornucopia. Why call it cornucopia? It’s a tongue-in-cheek reference to marketers always thinking about conversion funnels: ultimately, a cornucopia is like a funnel that keeps on giving. Also known as the “horn of plenty”, it’s basically a marketer’s wildest dream: a funnel that endlessly overflows with abundance.

More seriously: the package is available on GitHub and I’ve put some efforts in documenting it. Many functions can be used and are effectively being used, but overall given then breadth of the Meta API it remains very much a work in progress: only some use cases are covered; some are covered, but are not yet adequately documented; only some functions have effectively integrated efficient caching. But if you’re approaching the Meta/Facebook/Instagram graph API for the first time, I’ll say that overcoming the Byzantine system required to get an app up and running and then retrieving appropriately-scoped tokens will probably be more of a challenge than incomplete documentation in cornucopia. You have been warned. In due time, documentation will get better, I may get to write a few tutorials, and who knows, perhaps even make the app public. Until then… just enjoy this light-hearted post showcasing the business discovery Instagram API.

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

# if not available, install `cornucopia` with pak::pak("giocomai/cornucopia")
# the package is still experimental and not fully-featured or documented, yet
library("cornucopia")

start_date <- "2025-01-01"
end_date <- "2025-01-18"

# this is how I protect tokens for this session, adapt this as needed
keyring::keyring_unlock(password = readr::read_lines("key.txt"), keyring = "codebase")

cc_set(
  start_date = start_date,
  end_date = end_date,
  ig_user_id = keyring::key_get(service = "ig_user_id", username = "cornucopiar", keyring = "codebase"),
  fb_user_token = keyring::key_get(service = "fb_long_user_token", username = "giocomai", keyring = "codebase"),
  fb_user_id = keyring::key_get(service = "fb_user_id", username = "giocomai", keyring = "codebase")
)

library("dplyr")
library("ggplot2")
# pak::pak("ivelasq/severance")
library("severance")

theme_set(theme_minimal(base_family = "Roboto Condensed"))

Step 1: Retrieve the Instagram handles of Severance actors

First, we need to retrieve the Instagram handles of Severance actors. Rather than add them manually, we’ll query Wikidata via Wikipedia. Why? Because if you want to repeat this post with a different TV series or film the only thing you’ll need to do is to change the URL to the Wikipedia page in the following code chunk or the version of this post including code chunks, and all the rest will follow automagically.

library("tidywikidatar")
tw_create_cache_folder(ask = FALSE)
tw_enable_cache()

actors_ig_df <- tw_get_wikipedia_page_qid(url = "https://en.wikipedia.org/wiki/Severance_(TV_series)") |> 
  dplyr::pull(qid) |> 
  tw_get_property(p = "P161") |> 
  dplyr::transmute(actor_qid = value) |> 
  dplyr::mutate(actor_name = tw_get_label(actor_qid)) |> 
  dplyr::mutate(ig_username = tw_get_p1(actor_qid, p = "P2003"))

knitr::kable(
  actors_ig_df |> 
    dplyr::select(-actor_qid) |> 
    dplyr::mutate(ig_username = purrr::map_chr(
      .x = ig_username, .f = \(x) {
        if (is.na(x)) {
          "/"
        } else {
          htmltools::a(x, href = stringr::str_c("https://www.instagram.com/", x, "/")) |> as.character()
        }
      })), escape = FALSE)
actor_name ig_username
Adam Scott mradamscott
Britt Lower brittle
John Turturro john_turturro
Christopher Walken /
Patricia Arquette patriciaarquette
Jen Tullock surefineokay
Zach Cherry /
Yul Vazquez yuluminati
Dichen Lachman dichenlachman
Michael Chernus mchernus
Tramell Tillman tramell.tillman

Alright, not all of the cast has an Instagram account, but most do. Let’s proceed and check some info about their Instagram accounts.

actors_ig_no_na_df <- actors_ig_df |> 
  tidyr::drop_na(ig_username)

Step 2: Check out how many followers they have

In order to proceed, we’ll rely on the “business discovery” feature of the official Instagram API. To do so, you’ll need to have a business Instagram account associated with a Facebook page and get an appropriately scoped token: getting this right is a bit of a pain at first, but follow along this documentation and you’ll eventually get there. Notice that for this endeavour you’ll only need a fb_user_token (not the page token) and an ig_user_id (you’ll need to retrieve it through the API, is not the old Instagram id).

if (fs::file_exists("cast_ig_df.csv")==FALSE) {
  # cornucopia will eventually take care of the caching itself, but for the time being, we'll handle this manually
  # also: we want to cache the data at the moment this post is written
  cast_ig_df <- cc_get_instagram_bd_user(ig_username = actors_ig_no_na_df$ig_username) |> 
    dplyr::select(-id)
  readr::write_csv(x = cast_ig_df, file = "cast_ig_df.csv")
}

cast_ig_df <- readr::read_csv("cast_ig_df.csv", show_col_types = FALSE, progress = FALSE)

Let’s see some basic stats about their Instagram accounts:

cast_ig_df |> 
  dplyr::select(username, name, followers_count, follows_count, media_count) |> 
  dplyr::mutate(username = purrr::map_chr(
    .x = username, .f = \(x) {
      htmltools::a(x, href = stringr::str_c("https://www.instagram.com/", x, "/")) |> as.character()
    })) |> 
  dplyr::arrange(dplyr::desc(followers_count)) |> 
  knitr::kable(escape = FALSE, format.args = list(big.mark = " "))
username name followers_count follows_count media_count
mradamscott Adam Scott 1 024 036 1 302 506
patriciaarquette Patricia Arquette 214 228 7 499 50
dichenlachman Dichen Lachman 159 656 673 194
brittle britt lower 89 714 742 180
john_turturro John Turturro 20 241 92 86
tramell.tillman Tramell Tillman 19 741 1 376 49
surefineokay Jen Tullock 15 532 2 757 4 498
yuluminati YUL VAZQUEZ 13 654 4 555 3 649
mchernus Michael Chernus 12 328 3 226 507

Not bad, Adam Scott, with over 1 million followers, not bad. Some of them post very occasionally, judging by the total number of posts.

instagram_followers_gg <- cast_ig_df |> 
  dplyr::select(username, name, followers_count) |> 
  dplyr::arrange(followers_count) |> 
  dplyr::mutate(name = forcats::fct_inorder(name)) |> 
  ggplot() +
  geom_col(mapping = aes(x = followers_count,
                         y = name,
                         fill = username)) +
  scale_fill_manual(values = c(severance_palette("Dinner"), severance_palette("Hell"))) +
  scale_x_continuous(name = "Number of Instagram followers",
                     labels = scales::number) +
  scale_y_discrete(name = NULL) +
  labs(title = paste(sQuote("Severance"), "cast by number of followers on Instagram"),
       caption = "* As of January 2025") +
  theme(legend.position = "none")

ggplot2::ggsave(filename = "instagram_followers_gg.png",
                plot = instagram_followers_gg,
                width = 8,
                height = 6,
                bg = "white")

Time to see what they post.

Step 3: Retrieve their posts

Instagram has set a rather heavy throttling of this API endpoint to prevent scraping, so we’ll just retrieve the latest 100 post of each actor. Ultimately, API limits are reset after 1 hour, so adding some waiting time this can be scaled up to a reasonable extent for many use cases.

# manual caching, as long as I don't integrate proper caching in the core functions
base_media_folder <- fs::dir_create("ig_media")

media_df <- purrr::map(
  .x = cast_ig_df$username,
  .f = \(current_username) {
    current_filename <- fs::path(base_media_folder, fs::path_ext_set(path = current_username,
                                                                     ext = "csv") |> 
                                   fs::path_sanitize())
    if (fs::file_exists(current_filename)==FALSE) {
      current_media_df <- cc_get_instagram_bd_user_media(
        ig_username = current_username,
        max_pages = 10 # 4 pages, as each page has 25 posts
      )
      # dropping thumbnail and media url, as they are attached to my user and anyway stop working soon
      readr::write_csv(x = current_media_df |> 
                         dplyr::select(-thumbnail_url, -media_url),
                       file = current_filename)
    }
    current_media_df <- readr::read_csv(current_filename, show_col_types = FALSE, progress = FALSE)
    current_media_df
  }
) |> 
  purrr::list_rbind()

Step 4: check out what they post

So here we are, with about 250 posts per user, or much less for those who post infrequently. Retrieving 250 posts lets us go back in time less than a year for a few users, but all the way back to 2011 and their earliest post for others! In order to go back at least to the launch of the first season of Severance in early 2022, we’ll get a few more hundreds of posts for the most active accounts.

usernames_ordered_df <- cast_ig_df |> 
  dplyr::select(username)

usernames_ordered_df |> 
  dplyr::left_join(
    media_df |> 
      dplyr::group_by(username) |> 
      dplyr::count(name = "post"),
    by = "username") |>
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username) |> 
                     dplyr::summarise(earliest_post  = min(timestamp) |> as.Date(),
                                      latest_post = max(timestamp) |> as.Date()),
                   by = "username") |> 
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username, media_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_type, values_from = n), 
                   by = "username") |> 
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username, media_product_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_product_type, values_from = n), 
                   by = "username") |> 
  
  knitr::kable()
username post earliest_post latest_post CAROUSEL_ALBUM IMAGE VIDEO FEED REELS
mradamscott 250 2020-06-28 2025-01-14 77 141 32 243 7
brittle 159 2011-10-23 2025-01-17 29 103 27 147 12
john_turturro 85 2016-10-05 2024-07-10 4 76 5 84 1
patriciaarquette 48 2015-11-01 2025-01-15 5 37 6 43 5
surefineokay 1000 2020-06-24 2025-01-18 262 398 340 770 230
yuluminati 3250 2012-12-12 2025-01-18 65 2294 891 3017 233
dichenlachman 194 2013-04-05 2025-01-17 2 132 60 183 11
mchernus 250 2018-04-29 2024-12-17 46 188 16 246 4
tramell.tillman 34 2018-11-02 2025-01-16 24 5 5 29 5

N.B. All posts - in Instagram API parlance, we are actually talking about media items - are either “feed” or “reel”, and, separately, either carousel, image, or video.

Step 5: Time to focus on their Severance posts

Do they post about Severance?

severance_media_df <- media_df |> 
  dplyr::filter(stringr::str_detect(string = caption,
                                    pattern = stringr::fixed(pattern = "severance",
                                                             ignore_case = TRUE)))


usernames_ordered_df |> 
  dplyr::left_join(
    severance_media_df |> 
      dplyr::group_by(username) |> 
      dplyr::count(name = "post"),
    by = "username") |>
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username) |> 
                     dplyr::summarise(earliest_post  = min(timestamp) |> as.Date(),
                                      latest_post = max(timestamp) |> as.Date()),
                   by = "username") |> 
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username, media_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_type, values_from = n), 
                   by = "username") |> 
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username, media_product_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_product_type, values_from = n), 
                   by = "username") |> 
  
  knitr::kable()
username post earliest_post latest_post CAROUSEL_ALBUM IMAGE VIDEO FEED REELS
mradamscott 51 2021-11-20 2025-01-14 23 18 10 48 3
brittle 19 2020-01-20 2024-10-21 6 7 6 15 4
john_turturro 2 2022-01-19 2024-07-10 NA NA 2 1 1
patriciaarquette 1 2025-01-15 2025-01-15 1 NA NA 1 NA
surefineokay 46 2021-12-16 2025-01-17 21 9 16 40 6
yuluminati 23 2022-01-18 2025-01-18 1 13 9 18 5
dichenlachman 21 2021-12-16 2025-01-17 NA 8 13 18 3
mchernus 9 2022-02-25 2024-07-10 4 1 4 8 1
tramell.tillman 7 2022-04-09 2024-12-18 6 NA 1 6 1

All of them did, at least once! Notice that we probably miss some of the earliest post by the most active Instagram users, as we retrieved only the latest 100.

It appears, that every single one of the top 10 most-liked Instagram posts mentioning Severance by its cast is by mradamscott… such a 🌟.

But for the sake of balance, let’s combine the top Severance posts by each actor.

Click on the timestamp to see the original post.

severance_media_df |> 
  dplyr::group_by(username) |> 
  dplyr::arrange(dplyr::desc(like_count),
                 dplyr::desc(comments_count)) |> 
  dplyr::slice_head(n = 1) |> 
  dplyr::ungroup() |> 
  dplyr::mutate(timestamp = purrr::map2_chr(
    .x = as.character(as.Date(timestamp)),
    .y = permalink, .f = \(x, y) {
      htmltools::a(x, href = stringr::str_c(y)) |> as.character()
    })) |> 
  dplyr::select(timestamp, username, like_count, comments_count, caption) |> 
  dplyr::mutate(caption = stringr::str_trunc(string = caption, width = 24)) |> 
  dplyr::arrange(dplyr::desc(like_count),
                 dplyr::desc(comments_count)) |> 
  dplyr::rename(like = like_count, comments = comments_count) |> 
  knitr::kable(escape = FALSE, format.args = list(big.mark = " "))
timestamp username like comments caption
2022-10-31 mradamscott 74 336 1 374 Filming has begun on …
2022-07-21 brittle 14 767 176 we all went to high s…
2023-02-27 dichenlachman 11 955 128 Had an wonderful time…
2025-01-15 patriciaarquette 5 846 196 #severance appletv #g…
2024-07-10 tramell.tillman 2 716 213 Let the countdown beg…
2023-01-16 surefineokay 1 982 140 Thank you Critics Cho…
2023-02-28 mchernus 1 212 91 Oh what a night! Had …
2022-01-19 john_turturro 1 178 53 Here comes the offici…
2022-02-18 yuluminati 592 108 Severance is here!! S…

Step 6: Some data visualisation

This is all just preliminary data gathering and data exploration. The purpose of this post is just to show that it is possible using the official Instagram API to retrieve posts by other users, and conduct all sorts of data processing on the data thus retrieved.

The reader should see how this same technique could be used for all sorts of work, from data journalism to competitor analysis. One could analyse hashtags, or pass the images to locally deployed LLMs to enrich the analysis, and ultimately, see what works best based on a set of criteria.

Just for the sake of it, let’s do some visualisations, keeping in mind that in this case the data are not really fully comparable (i.e. we have only recent posts by some of the cast members).

Severance fans will appreciate the colour palettes inspired by some of the most memorable scenes in the series.

media_gg_df <- media_df |> 
  dplyr::mutate(created_time = lubridate::as_date(timestamp)) |> 
  mutate(year = lubridate::year(created_time), 
         month = lubridate::month(created_time),
         day = lubridate::day(created_time)) |> 
  mutate(month = factor(x = month,
                        levels = 12:1,
                        labels = rev(month.name)
  )) |> 
  rename(`Like` = like_count,
         `Format` = media_type) |> 
  filter(created_time>=lubridate::as_datetime("2022-01-01"))

instagram_bubble_gg <- media_gg_df |> 
  mutate(month_year = paste(month, year, sep = " ")) |>
  arrange(desc(created_time)) |> 
  mutate(month_year = forcats::fct_inorder(month_year)) |> 
  ggplot(mapping = aes(x = day,
                       y = month_year,
                       size = `Like`,
                       colour = `Format`)) +
  geom_point(alpha = 0.8) +
  scale_color_manual(values = severance_palette("Jazz02")) +
  #scale_colour_viridis_d() +
  guides(fill = guide_legend(reverse = TRUE)) + 
  scale_x_continuous(name = "Day of the month",
                     breaks = c(1, 5, 10, 15, 20, 25, 30),
                     minor_breaks = c(1:31)) +
  scale_y_discrete(name = NULL, expand = expansion(add = 1)) +
  scale_size_continuous(range = c(0.1,12), labels = scales::number) +
  guides(colour = guide_legend(override.aes = list(size=12))) +
  theme(legend.direction = "horizontal",
        legend.position = "bottom"
  ) +
  labs(title = "Type and number of likes on Instagram",
       subtitle = paste("Out of a total of", scales::number(nrow(media_gg_df)), "posts published by", sQuote("Severance"),  "actors starting with", lubridate::date(media_gg_df$created_time) |> min() |> format("%B %Y") |> stringr::str_squish())) +
  theme(strip.text = element_text(size = 20),
        legend.box="vertical", 
        legend.margin=margin())


ggplot2::ggsave(filename = "instagram_bubble_gg.png",
                plot = instagram_bubble_gg,
                width = 8,
                height = 10,
                bg = "white")

It’s easy to notice something unexpected: the biggest hits are carousel albums, not video. If this was a serious analysis, then one would go on and investigate why these posts works, or why the video clips are not hits, or…

One final note: folks interested in analysis of their own Instagram channel (or, for that matter, Facebook page) may want to consider how the official APIs give a lot more data about your own posts, enabling much more revealing analyses, including e.g. (and even without mentioning all the fine-grained options) change in number of average video views for organic posts across time (easy to highlight with changepoint algorithms), comparison of sponsored over organic posts, the success of specific types of posts based on their caption contents, or timing of the day when they have been posted, etc.

P.S. this post has served as the basis for a note on Roxana Todea’s website.