Medical spending per beneficiary data and data description was obtained from Data.Medicare.gov. The data displayed here shows average spending levels during hospitals’ Medicare Spending per Beneficiary (MSPB) episodes. An MSPB episode includes all Medicare Part A and Part B claims paid during the period from 3 days prior to a hospital admission through 30 days after discharge. These average Medicare payment amounts have been price-standardized to remove the effect of geographic payment differences and add-on payments for indirect medical education (IME) and disproportionate share hospitals (DSH).
The data included a total of 3,065 hospitals from 50 states with 7 claim types at 3 periods. The average spending per episode was given by hospital, by state and by nation. First we import the data and rename some of the origional variables names using the dplyr
package.
The aim of this post is to explore whether
- The total average spending per complete episode vary across states
- The average spending per episode vary across states for agiven claim type and period
The average spending across providers by period and claim type within states are explored in another blog.
# Load R li
aries
library(dplyr)
library(ggplot2)
library(stargazer)
library(maps)
# Call the data and rename variable names
mdClm <- read.csv(paste("../Data/Medicare Hospital Spending by Claim.csv", sep=""))
mdClm <- rename(mdClm, AvgSpndgHsptl=Avg_Spending_Per_Episode_Hospital, AvgSpndgState=Avg_Spending_Per_Episode_State,
AvgSpndgNation=Avg_Spending_Per_Episode_Nation, PcntSpndgHsptl=Percent_of_Spending_Hospital,
PctSpndgState=Percent_of_Spending_State, PctSpndgNation=Percent_of_Spending_Nation)
head(mdClm, n=3)
To explore the total average spending per period and claim type, first agregate the data using the package/function dplyr/select
. Then generate a bar plot for each period and claim type combination. Regardless of states, the highest average spending was for inpatient claim during index hospital admission followed by skilled nursing facility 1 through 30 days after discharge from index hospital admission (see the bar plot below). There was no spending for the claim types Home Health Agency, Hospice, Outpatient and Skilled Nursing Facility during hospital admission, and for the hospice claim 1 to 3 days Prior to Index Hospital Admission.
# Aggregate the data by Period and Claim Type to explore where the most average spending is:
periodClmType <- select(mdClm, Period, Claim_Type, AvgSpndgHsptl) %>%
filter(Claim_Type != "Total") %>%
group_by(Period, Claim_Type) %>%
summarize(AvgSpndng = mean(AvgSpndgHsptl , na.rm = TRUE),
MedSpndng = median(AvgSpndgHsptl , na.rm = TRUE))
# bar plot of average spending by Claimy Type and Period
ggplot(periodClmType, aes(fill=Period, y=MedSpndng, x=Claim_Type)) +
geom_bar(position="dodge", stat="identity") +
theme(axis.text.x=element_text(size=7, angle=45, vjust=0.5, colour="black"),
panel.background = element_rect(fill = "white", colour = "gray"),
panel.grid.major = element_line(colour = "lightgray"),
legend.position = "top", aspect.ratio=4/10) +
xlab("Claim Type") + ylab("Average Spending")+
guides(fill=guide_legend(nrow=3,byrow=TRUE))
First call distinct state and average spending data to avoid duplicate data values; i.e. one average spending per state, using dplyr/filer and distinct
function or any other appropriate package/function.
# Select distinct state and average spending
totalStateClm <- mdClm %>%
filter(Claim_Type == "Total") %>%
distinct(State, AvgSpndgState) %>%
arrange(AvgSpndgState)
Then summarize the average spending data. I used the stargazer
package to generate a well formatted summary statistics. As shown in the table below, the range of total average spending was less than $4,000, and the inter-quartile range is about \$1,000. The total spending across states is shown in the US map that was filled with the total average spending.
# Summary statistis (Mean, Std, Median, Quartiles, Min and Max)
totalStateClmSpndng <- data.frame(AvgSpending=totalStateClm$AvgSpndgState)
stargazer(totalStateClmSpndng, type = "text", title="Summary of Medicare Average Spending per Episode State",
digits=0, median=TRUE, iqr=TRUE, min.max=TRUE)
# Call the data required to plot USA map (altitude, latitude, region / state)
# And merge with medicare spending data
map <- map_data("state")
states <- read.csv(paste("../Data/states.csv", sep=""))
dataClm <- merge(totalStateClm, states, by.x="State", by.y="Abbreviation")
dataClm$state <- tolower(dataClm$state)# Convert all state chracter into lower case
# Plot total average spending across USA states
m <- ggplot(dataClm, aes(fill = AvgSpndgState))
m + geom_map(aes(map_id = state), map = map) +
expand_limits(x = map$long, y = map$lat) + ggtitle("Total average spending for complete episode across states")+
theme(axis.title=element_blank(), axis.text=element_blank(), axis.ticks=element_blank(),
legend.position="right", legend.title=element_blank()) + coord_fixed(ratio = 1.75)
As shown in the map the total average spending was highest in Nevada, Texas and Utah, and lowest in West Virginia, New Mexico and Iowa.
cat("States with highest average spending")
head(totalStateClm, n=5)
cat("States with lowest average spending")
tail(totalStateClm, n=5)
Here I generated a bar plot for each state per period and claim type. There is a variation across states for most of period and claim types.
# Aggregate the data by Period and Claim Type to explore where the most average spending is:
agrgtStPrdClm <- mdClm %>%
select(State, Period, Claim_Type, AvgSpndgHsptl) %>%
filter(Claim_Type != "Total") %>%
group_by(State, Period, Claim_Type) %>%
summarize(AvgSpndng = mean(AvgSpndgHsptl , na.rm = TRUE),
MedSpndng = median(AvgSpndgHsptl , na.rm = TRUE))
ggplot(agrgtStPrdClm, aes(x=State, y=MedSpndng)) +
geom_bar(aes(fill = Period), position = "dodge", stat="identity") +
facet_wrap(~Claim_Type, nrow=7, scales = "free_y") +
theme(legend.position = "top", axis.text.x=element_text(size=6, angle=45, vjust=0.5, colour="black")) +
xlab("States") + ylab("Median Spending") +
guides(fill=guide_legend(nrow=3,byrow=TRUE))
This post helps to visualize average medicare spending distribution and whether there were huge differences across states per complete episode as well as per period and claim type. As expected, there were huge differences due to period and claim type, but not across states for the complete episode.