- Jupyter Notebook
-
Python packages and libraries:
- Numpy
- Pandas
- Matplotlib
- Seaborn
InstaCart Story
Instacart, an online grocery store that operates through an app in North America, already has very good sales, but they want to uncover more information about their sales patterns. They assume they can't target everyone using the same methods, and they’re considering a targeted marketing strategy.
Objective
My task is to perform an initial data and exploratory analysis of some of their data in order to derive insights and suggest strategies for better customer segmentation based on the provided criteria.
- Data wrangling
- Deriving new columns
- Subsetting
- Combining, grouping and aggregating data
- Visualizations in Python
- Markdown for Jupyter notebooks.
- Format: Excel-CSV, PKL
- Records: 32.4M orders
- Information: orders, products, customers, departments
- Data Dictionary
- Customer Dataset
- Data Citation:
The Instacart Online Grocery Shopping Dataset 2017
Accessed: 10 Jan 2023
Questions
The sales team needs to know what the busiest days of the week and hours of the day are in order to schedule ads at times when there are fewer orders.
Solution
Histograms have been plotted in order to see the distribution of orders over the days of a week/hours of a day. We also draw a line chart to find the customers’ spending habits for a day.
Saturday, Sunday and Friday respectively are the busiest days of the week. In terms of time in a day, between 10am to 3pm is the busiest time to place an order. Orders drop drastically after 6pm until the next day 8am, so weekdays on mentioned hours might be the perfect time to schedule ads.
Please note in the below figures, days are represented in numeric values where 0=Saturday and 6=Friday.
Question
The sales team needs to know whether there are particular times of the day when people spend the most money, as this might inform the type of products they advertise at these times.
Solution
High frequency of orders with a low expenditure are placed during the day (9am-5pm). However, a low frequency of orders with a high expenditure happens between 6-8am, early in the morning.
Also, in the afternoon we observed two peaks at noon and around 9 pm. IC can use these hours to inform the type of products they advertise.
In the right chart (bar) we can see the distribution of orders during the three periods in a day. Early birds place their orders between 5am-8am, regular customers 9am-11pm, and night owls after midnight to 4 am. Over 80% of orders come through regular hours.
Question
Instacart has a lot of products with different price tags. Marketing and sales want to use simpler price range groupings to help direct their efforts.
Solution
Products are organized in three categories: Low/Mid/High-range based on given conditions. To get a detailed conclusion please refer to the crosstab tables and charts explanation below the visualizations.
- Meat seafood has the highest percentage of high-range products with around 44%.
- Bulk and meat seafood don't have an item listed under low-range products.
- Alcohol, others, and babies have the highest proportion of mid-range products with over 85%.
- Other remaining categories are in low-to-medium range.
- Please note that dairy products are not really high- range, and they are categorized as such because of an outlier (2% milk priced at 99,999).
- Please also note that Pantry high-ranged products contribute just less than 0.03% (small proportion of the whole pantry products), so we can not see this category on the stacked bar chart below.
Question
The marketing and sales teams are particularly interested in the different types of customers in their system and how their ordering behaviors differ. For example:
-
What’s the distribution among users in regards to their brand loyalty (i.e., how often do they return to Instacart)?
51% of orders have been placed by regular customers, 33% by loyal customers, and the remaining 16% are submitted by new customers. -
Are there differences in ordering habits based on a customer’s loyalty status?
Almost 99% of loyal customers shop frequently at IC. Promoting point-based loyalty reward programs would help other 56% customers (Regular/ new customers proportion) to upgrading their status to loyal customers. -
Are there differences in ordering habits based on a customer’s region?
Ordering habits in different regions are almost the same from proportional perspective. As South region have more population, we observe the higher bar for frequent customers in these states. -
Is there a connection between age and family status in terms of ordering habits?
No, there is no meaningful relation between age and number of dependents in a family.
Question
What different classifications does the demographic information suggest? Age? Income? Certain types of goods? Family status?
Recommendation
- Offer incentives on bulk, other, pets, and bakery departments might help them to increase the sales (Like buy group of 3 get 15% discount/ buy-one-get-one free). Some initiatives like free delivery options on bulk and others plus promoting low-price protection plans /guarantees will be helpful as well.
- We can introduce some discount like 15-20% off for next order in 5 days to motivate customers return more and turn them to the loyal customers.
- We can introduce seniors/students' days with special discount like Thursdays and Tuesdays which are not busy days to encourage senior customers placing more orders on slow days.