Explore the Instagram API based on the cast of "Severance"

A preliminary note

This post largely relies on package for the R programming language I’ve been working on, that is still under development. Its name is cornucopia. Why call it cornucopia? It’s a tongue-in-cheek reference to marketers always thinking about conversion funnels: ultimately, a cornucopia is like a funnel that keeps on giving. Also known as the “horn of plenty”, it’s basically a marketer’s wildest dream: a funnel that endlessly overflows with abundance.

More seriously: the package is available on GitHub and I’ve put some efforts in documenting it. Many functions can be used and are effectively being used, but overall given then breadth of the Meta API it remains very much a work in progress: only some use cases are covered; some are covered, but are not yet adequately documented; only some functions have effectively integrated efficient caching. But if you’re approaching the Meta/Facebook/Instagram graph API for the first time, I’ll say that overcoming the Byzantine system required to get an app up and running and then retrieving appropriately-scoped tokens will probably be more of a challenge than incomplete documentation in cornucopia. You have been warned. In due time, documentation will get better, I may get to write a few tutorials, and who knows, perhaps even make the app public. Until then… just enjoy this light-hearted post showcasing the business discovery Instagram API.

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

# if not available, install `cornucopia` with pak::pak("giocomai/cornucopia")
# the package is still experimental and not fully-featured or documented, yet
library("cornucopia")

start_date <- "2025-01-01"
end_date <- "2025-01-18"

# this is how I protect tokens for this session, adapt this as needed
keyring::keyring_unlock(password = readr::read_lines("key.txt"), keyring = "codebase")

cc_set(
  start_date = start_date,
  end_date = end_date,
  ig_user_id = keyring::key_get(service = "ig_user_id", username = "cornucopiar", keyring = "codebase"),
  fb_user_token = keyring::key_get(service = "fb_long_user_token", username = "giocomai", keyring = "codebase"),
  fb_user_id = keyring::key_get(service = "fb_user_id", username = "giocomai", keyring = "codebase")
)

library("dplyr")
library("ggplot2")
# pak::pak("ivelasq/severance")
library("severance")

theme_set(theme_minimal(base_family = "Roboto Condensed"))

Step 1: Retrieve the Instagram handles of Severance actors

First, we need to retrieve the Instagram handles of Severance actors. Rather than add them manually, we’ll query Wikidata via Wikipedia. Why? Because if you want to repeat this post with a different TV series or film the only thing you’ll need to do is to change the URL to the Wikipedia page in the following code chunk or the version of this post including code chunks, and all the rest will follow automagically.

library("tidywikidatar")
tw_create_cache_folder(ask = FALSE)
tw_enable_cache()

actors_ig_df <- tw_get_wikipedia_page_qid(url = "https://en.wikipedia.org/wiki/Severance_(TV_series)") |> 
  dplyr::pull(qid) |> 
  tw_get_property(p = "P161") |> 
  dplyr::transmute(actor_qid = value) |> 
  dplyr::mutate(actor_name = tw_get_label(actor_qid)) |> 
  dplyr::mutate(ig_username = tw_get_p1(actor_qid, p = "P2003"))

knitr::kable(
  actors_ig_df |> 
    dplyr::select(-actor_qid) |> 
    dplyr::mutate(ig_username = purrr::map_chr(
      .x = ig_username, .f = \(x) {
        if (is.na(x)) {
          "/"
        } else {
          htmltools::a(x, href = stringr::str_c("https://www.instagram.com/", x, "/")) |> as.character()
        }
      })), escape = FALSE)

actor_name	ig_username
Adam Scott	mradamscott
Britt Lower	brittle
John Turturro	john_turturro
Christopher Walken	/
Patricia Arquette	patriciaarquette
Jen Tullock	surefineokay
Zach Cherry	/
Yul Vazquez	yuluminati
Dichen Lachman	dichenlachman
Michael Chernus	mchernus
Tramell Tillman	tramell.tillman

Alright, not all of the cast has an Instagram account, but most do. Let’s proceed and check some info about their Instagram accounts.

actors_ig_no_na_df <- actors_ig_df |> 
  tidyr::drop_na(ig_username)

Step 2: Check out how many followers they have

In order to proceed, we’ll rely on the “business discovery” feature of the official Instagram API. To do so, you’ll need to have a business Instagram account associated with a Facebook page and get an appropriately scoped token: getting this right is a bit of a pain at first, but follow along this documentation and you’ll eventually get there. Notice that for this endeavour you’ll only need a fb_user_token (not the page token) and an ig_user_id (you’ll need to retrieve it through the API, is not the old Instagram id).

if (fs::file_exists("cast_ig_df.csv")==FALSE) {
  # cornucopia will eventually take care of the caching itself, but for the time being, we'll handle this manually
  # also: we want to cache the data at the moment this post is written
  cast_ig_df <- cc_get_instagram_bd_user(ig_username = actors_ig_no_na_df$ig_username) |> 
    dplyr::select(-id)
  readr::write_csv(x = cast_ig_df, file = "cast_ig_df.csv")
}

cast_ig_df <- readr::read_csv("cast_ig_df.csv", show_col_types = FALSE, progress = FALSE)

Let’s see some basic stats about their Instagram accounts:

cast_ig_df |> 
  dplyr::select(username, name, followers_count, follows_count, media_count) |> 
  dplyr::mutate(username = purrr::map_chr(
    .x = username, .f = \(x) {
      htmltools::a(x, href = stringr::str_c("https://www.instagram.com/", x, "/")) |> as.character()
    })) |> 
  dplyr::arrange(dplyr::desc(followers_count)) |> 
  knitr::kable(escape = FALSE, format.args = list(big.mark = " "))

username	name	followers_count	follows_count	media_count
mradamscott	Adam Scott	1 024 036	1 302	506
patriciaarquette	Patricia Arquette	214 228	7 499	50
dichenlachman	Dichen Lachman	159 656	673	194
brittle	britt lower	89 714	742	180
john_turturro	John Turturro	20 241	92	86
tramell.tillman	Tramell Tillman	19 741	1 376	49
surefineokay	Jen Tullock	15 532	2 757	4 498
yuluminati	YUL VAZQUEZ	13 654	4 555	3 649
mchernus	Michael Chernus	12 328	3 226	507

Not bad, Adam Scott, with over 1 million followers, not bad. Some of them post very occasionally, judging by the total number of posts.

instagram_followers_gg <- cast_ig_df |> 
  dplyr::select(username, name, followers_count) |> 
  dplyr::arrange(followers_count) |> 
  dplyr::mutate(name = forcats::fct_inorder(name)) |> 
  ggplot() +
  geom_col(mapping = aes(x = followers_count,
                         y = name,
                         fill = username)) +
  scale_fill_manual(values = c(severance_palette("Dinner"), severance_palette("Hell"))) +
  scale_x_continuous(name = "Number of Instagram followers",
                     labels = scales::number) +
  scale_y_discrete(name = NULL) +
  labs(title = paste(sQuote("Severance"), "cast by number of followers on Instagram"),
       caption = "* As of January 2025") +
  theme(legend.position = "none")

ggplot2::ggsave(filename = "instagram_followers_gg.png",
                plot = instagram_followers_gg,
                width = 8,
                height = 6,
                bg = "white")

Time to see what they post.

Step 3: Retrieve their posts

Instagram has set a rather heavy throttling of this API endpoint to prevent scraping, so we’ll just retrieve the latest 100 post of each actor. Ultimately, API limits are reset after 1 hour, so adding some waiting time this can be scaled up to a reasonable extent for many use cases.

# manual caching, as long as I don't integrate proper caching in the core functions
base_media_folder <- fs::dir_create("ig_media")

media_df <- purrr::map(
  .x = cast_ig_df$username,
  .f = \(current_username) {
    current_filename <- fs::path(base_media_folder, fs::path_ext_set(path = current_username,
                                                                     ext = "csv") |> 
                                   fs::path_sanitize())
    if (fs::file_exists(current_filename)==FALSE) {
      current_media_df <- cc_get_instagram_bd_user_media(
        ig_username = current_username,
        max_pages = 10 # 4 pages, as each page has 25 posts
      )
      # dropping thumbnail and media url, as they are attached to my user and anyway stop working soon
      readr::write_csv(x = current_media_df |> 
                         dplyr::select(-thumbnail_url, -media_url),
                       file = current_filename)
    }
    current_media_df <- readr::read_csv(current_filename, show_col_types = FALSE, progress = FALSE)
    current_media_df
  }
) |> 
  purrr::list_rbind()

Step 4: check out what they post

So here we are, with about 250 posts per user, or much less for those who post infrequently. Retrieving 250 posts lets us go back in time less than a year for a few users, but all the way back to 2011 and their earliest post for others! In order to go back at least to the launch of the first season of Severance in early 2022, we’ll get a few more hundreds of posts for the most active accounts.

usernames_ordered_df <- cast_ig_df |> 
  dplyr::select(username)

usernames_ordered_df |> 
  dplyr::left_join(
    media_df |> 
      dplyr::group_by(username) |> 
      dplyr::count(name = "post"),
    by = "username") |>
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username) |> 
                     dplyr::summarise(earliest_post  = min(timestamp) |> as.Date(),
                                      latest_post = max(timestamp) |> as.Date()),
                   by = "username") |> 
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username, media_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_type, values_from = n), 
                   by = "username") |> 
  dplyr::left_join(y = media_df |> 
                     dplyr::group_by(username, media_product_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_product_type, values_from = n), 
                   by = "username") |> 
  
  knitr::kable()

username	post	earliest_post	latest_post	CAROUSEL_ALBUM	IMAGE	VIDEO	FEED	REELS
mradamscott	250	2020-06-28	2025-01-14	77	141	32	243	7
brittle	159	2011-10-23	2025-01-17	29	103	27	147	12
john_turturro	85	2016-10-05	2024-07-10	4	76	5	84	1
patriciaarquette	48	2015-11-01	2025-01-15	5	37	6	43	5
surefineokay	1000	2020-06-24	2025-01-18	262	398	340	770	230
yuluminati	3250	2012-12-12	2025-01-18	65	2294	891	3017	233
dichenlachman	194	2013-04-05	2025-01-17	2	132	60	183	11
mchernus	250	2018-04-29	2024-12-17	46	188	16	246	4
tramell.tillman	34	2018-11-02	2025-01-16	24	5	5	29	5

N.B. All posts - in Instagram API parlance, we are actually talking about media items - are either “feed” or “reel”, and, separately, either carousel, image, or video.

Step 5: Time to focus on their Severance posts

Do they post about Severance?

severance_media_df <- media_df |> 
  dplyr::filter(stringr::str_detect(string = caption,
                                    pattern = stringr::fixed(pattern = "severance",
                                                             ignore_case = TRUE)))


usernames_ordered_df |> 
  dplyr::left_join(
    severance_media_df |> 
      dplyr::group_by(username) |> 
      dplyr::count(name = "post"),
    by = "username") |>
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username) |> 
                     dplyr::summarise(earliest_post  = min(timestamp) |> as.Date(),
                                      latest_post = max(timestamp) |> as.Date()),
                   by = "username") |> 
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username, media_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_type, values_from = n), 
                   by = "username") |> 
  dplyr::left_join(y = severance_media_df |> 
                     dplyr::group_by(username, media_product_type) |> 
                     dplyr::count() |> 
                     tidyr::pivot_wider(names_from = media_product_type, values_from = n), 
                   by = "username") |> 
  
  knitr::kable()

username	post	earliest_post	latest_post	CAROUSEL_ALBUM	IMAGE	VIDEO	FEED	REELS
mradamscott	51	2021-11-20	2025-01-14	23	18	10	48	3
brittle	19	2020-01-20	2024-10-21	6	7	6	15	4
john_turturro	2	2022-01-19	2024-07-10	NA	NA	2	1	1
patriciaarquette	1	2025-01-15	2025-01-15	1	NA	NA	1	NA
surefineokay	46	2021-12-16	2025-01-17	21	9	16	40	6
yuluminati	23	2022-01-18	2025-01-18	1	13	9	18	5
dichenlachman	21	2021-12-16	2025-01-17	NA	8	13	18	3
mchernus	9	2022-02-25	2024-07-10	4	1	4	8	1
tramell.tillman	7	2022-04-09	2024-12-18	6	NA	1	6	1

All of them did, at least once! Notice that we probably miss some of the earliest post by the most active Instagram users, as we retrieved only the latest 100.

It appears, that every single one of the top 10 most-liked Instagram posts mentioning Severance by its cast is by mradamscott… such a 🌟.

But for the sake of balance, let’s combine the top Severance posts by each actor.

Click on the timestamp to see the original post.

severance_media_df |> 
  dplyr::group_by(username) |> 
  dplyr::arrange(dplyr::desc(like_count),
                 dplyr::desc(comments_count)) |> 
  dplyr::slice_head(n = 1) |> 
  dplyr::ungroup() |> 
  dplyr::mutate(timestamp = purrr::map2_chr(
    .x = as.character(as.Date(timestamp)),
    .y = permalink, .f = \(x, y) {
      htmltools::a(x, href = stringr::str_c(y)) |> as.character()
    })) |> 
  dplyr::select(timestamp, username, like_count, comments_count, caption) |> 
  dplyr::mutate(caption = stringr::str_trunc(string = caption, width = 24)) |> 
  dplyr::arrange(dplyr::desc(like_count),
                 dplyr::desc(comments_count)) |> 
  dplyr::rename(like = like_count, comments = comments_count) |> 
  knitr::kable(escape = FALSE, format.args = list(big.mark = " "))

timestamp	username	like	comments	caption
2022-10-31	mradamscott	74 336	1 374	Filming has begun on …
2022-07-21	brittle	14 767	176	we all went to high s…
2023-02-27	dichenlachman	11 955	128	Had an wonderful time…
2025-01-15	patriciaarquette	5 846	196	#severance appletv #g…
2024-07-10	tramell.tillman	2 716	213	Let the countdown beg…
2023-01-16	surefineokay	1 982	140	Thank you Critics Cho…
2023-02-28	mchernus	1 212	91	Oh what a night! Had …
2022-01-19	john_turturro	1 178	53	Here comes the offici…
2022-02-18	yuluminati	592	108	Severance is here!! S…

Step 6: Some data visualisation

This is all just preliminary data gathering and data exploration. The purpose of this post is just to show that it is possible using the official Instagram API to retrieve posts by other users, and conduct all sorts of data processing on the data thus retrieved.

The reader should see how this same technique could be used for all sorts of work, from data journalism to competitor analysis. One could analyse hashtags, or pass the images to locally deployed LLMs to enrich the analysis, and ultimately, see what works best based on a set of criteria.

Just for the sake of it, let’s do some visualisations, keeping in mind that in this case the data are not really fully comparable (i.e. we have only recent posts by some of the cast members).

Severance fans will appreciate the colour palettes inspired by some of the most memorable scenes in the series.

media_gg_df <- media_df |> 
  dplyr::mutate(created_time = lubridate::as_date(timestamp)) |> 
  mutate(year = lubridate::year(created_time), 
         month = lubridate::month(created_time),
         day = lubridate::day(created_time)) |> 
  mutate(month = factor(x = month,
                        levels = 12:1,
                        labels = rev(month.name)
  )) |> 
  rename(`Like` = like_count,
         `Format` = media_type) |> 
  filter(created_time>=lubridate::as_datetime("2022-01-01"))

instagram_bubble_gg <- media_gg_df |> 
  mutate(month_year = paste(month, year, sep = " ")) |>
  arrange(desc(created_time)) |> 
  mutate(month_year = forcats::fct_inorder(month_year)) |> 
  ggplot(mapping = aes(x = day,
                       y = month_year,
                       size = `Like`,
                       colour = `Format`)) +
  geom_point(alpha = 0.8) +
  scale_color_manual(values = severance_palette("Jazz02")) +
  #scale_colour_viridis_d() +
  guides(fill = guide_legend(reverse = TRUE)) + 
  scale_x_continuous(name = "Day of the month",
                     breaks = c(1, 5, 10, 15, 20, 25, 30),
                     minor_breaks = c(1:31)) +
  scale_y_discrete(name = NULL, expand = expansion(add = 1)) +
  scale_size_continuous(range = c(0.1,12), labels = scales::number) +
  guides(colour = guide_legend(override.aes = list(size=12))) +
  theme(legend.direction = "horizontal",
        legend.position = "bottom"
  ) +
  labs(title = "Type and number of likes on Instagram",
       subtitle = paste("Out of a total of", scales::number(nrow(media_gg_df)), "posts published by", sQuote("Severance"),  "actors starting with", lubridate::date(media_gg_df$created_time) |> min() |> format("%B %Y") |> stringr::str_squish())) +
  theme(strip.text = element_text(size = 20),
        legend.box="vertical", 
        legend.margin=margin())


ggplot2::ggsave(filename = "instagram_bubble_gg.png",
                plot = instagram_bubble_gg,
                width = 8,
                height = 10,
                bg = "white")

It’s easy to notice something unexpected: the biggest hits are carousel albums, not video. If this was a serious analysis, then one would go on and investigate why these posts works, or why the video clips are not hits, or…

One final note: folks interested in analysis of their own Instagram channel (or, for that matter, Facebook page) may want to consider how the official APIs give a lot more data about your own posts, enabling much more revealing analyses, including e.g. (and even without mentioning all the fine-grained options) change in number of average video views for organic posts across time (easy to highlight with changepoint algorithms), comparison of sponsored over organic posts, the success of specific types of posts based on their caption contents, or timing of the day when they have been posted, etc.

P.S. this post has served as the basis for a note on Roxana Todea’s website.