--
In the world of software development, tools like GitHub Copilot and ChatGPT have gained popularity as powerful solutions for solving programming tasks. However, these proprietary systems often come with limitations, including a lack of transparency in training data and the inability to adapt them to specific domains or codebases. Fortunately, the open-source community has responded with high-quality alternatives that address these drawbacks. This article explores the emergence of open-source language models and their potential to revolutionize the way developers interact with coding assistants.
One of the most promising open-source alternatives, StarCoder, stands out with its impressive training data. With a 16B parameter model trained on over a trillion tokens sourced from 80+ programming languages, GitHub issues, Git commits, and Jupyter notebooks, StarCoder provides an extensive and diverse knowledge base for code-based applications.
Try it out : https://huggingface.co/spaces/HuggingFaceH4/starchat-playground
We wanted to see how our model could do with basic visualization tasks, following the famous unicorn drawing in tikz of GPT-4. To do this, we prompted our model with some coding tasks and got wonderful results! Yes, these are a little cherry picked as we only selected the completions that wrote functioning code, but the others were not far off!
Example 1: bar plot
Prompt:
Give the following data {'Name':['Tom', 'Brad', 'Kyle', 'Jerry'], 'Age':[20, 21, 19, 18], 'Height' : [6.1, 5.9, 6.0, 6.1]}. Can you plot one graph with two subplots as columns. The first is a bar graph showing the height of each person. The second is a bargraph showing the age of each person? Draw the graph in seaborn talk mode.
Response:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-talk')data = {'Name':['Tom', 'Brad', 'Kyle', 'Jerry'],
'Age':[20, 21, 19, 18],
'Height':[6.1, 5.9, 6.0, 6.1]}
df=pd.DataFrame(data)
# Bar Graph for Height and Age
fig,(ax1, ax2)=plt.subplots(nrows=1, ncols=2, figsize=(14,7))
sns.barplot(x='Name', y="Height", data=df, palette="Blues_d", ax=ax1).set_title("Height Comparison")
sns.barplot(x='Name',y ="Age", data=df,palette="Reds_d", ax=ax2 ).set_title("Age Comparison")
# Add padding between plots
fig.tight_layout()
# Show the final image
plt.show()
Example of a plot generated by StarChat.
Example 2: mapping
Prompt:
Draw me a map of the world using geopandas. Make it so that only Germany and Spain are colored red.
Response:
import pandas as pd
from matplotlib import pyplot as plt
import geopandas as gpd
from shapely.geometry import Point, Polygon
%matplotlib inline
# Load data from URL or local file
url = 'https://raw.githubusercontent.com/datasets/geo-countries/master/data/countries.geojson'
df = gpd.read_file(url)# Create plot
fig, ax = plt.subplots()
ax.set_aspect('equal')
world = gpd.GeoDataFrame({'name': ['World'],
'geometry': [Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])]})
germany = df[df['ADMIN'] == "Germany"]
spain = df[df['ADMIN'] == "Spain"]
other = df[(df['ADMIN']!= "Germany") & (df['ADMIN']!= "Spain")]
world.plot(color='lightgrey', edgecolor='white', ax=ax)
germany.plot(color="red", ax=ax)
spain.plot(color="red", ax=ax)
other.plot(color="skyblue", ax=ax)
plt.title("European Countries")
plt.show()
Example of a plot generated by StarChat.
Example 3: basketball
Prompt:
There was a basketball game with the following stats. player, points, rebounds and assists: J. Harden 42 8 9 J. Embiid 34 13 4 T. Maxey 14 8 1 T. Harris 9 7 2 P. Tucker 6 3 1. Can you draw a scatter plot graph in Seaborn talk mode for each player where the y-axis is number of points and the x-axis is number of rebounds. Make sure the marker is 'o'. Put the name of the player over each point. The title should be "Sixers Players".
Response:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline# Load data into dataframe
df = pd.DataFrame({'Player': ['J. Harden', 'J. Embiid', 'T. Maxey', 'T. Harris', 'P. Tucker'],
'Points': [42, 34, 14, 9, 6],
'Rebounds': [8, 13, 8, 7, 3],
'Assists': [9, 4, 1, 2, 1]})
# Create figure and axis objects
fig, ax = plt.subplots()
# Plot points vs rebounds
sns.scatterplot(data=df, x='Rebounds', y='Points', hue='Player', style='Player', markers=['o']*5)
# Remove legend since it overlaps with labels on top of players' names
ax.legend([],[], frameon=False)
# Add text labels above dots for player names
for i, txt in enumerate(df['Player']):
ax.annotate(txt, (df["Rebounds"][i] +.1, df["Points"][i]))
# Set titles and labels
ax.set_title('Sixers Player')
ax.set_xlabel('Number of Rebounds')
ax.set_ylabel('Number of Points')
plt.show()
Example of a plot generated by StarChat.
Conversational agents can transform LLMs into interactive assistants by utilizing carefully crafted prompts. These prompts usually consist of a “system” message that establishes the LLM’s persona, accompanied by a sequence of dialogues between the assistant and a user.
As an illustration, let’s consider an excerpt from Anthropic’s HHH prompt, which spans an extensive 6,000 tokens:
Below are a series of dialogues between various people and an AI assistant.
The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable.
The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed.
It also tries to avoid giving false or misleading information, and it caveats when it isn’t entirely sure about the right answer.
That said, the assistant is practical and really does its best, and doesn’t let caution get too much in the way of being useful.-----
Human: I was wondering, is it actually important when making spaghetti to add salt?
Assistant: Do you mean, is it important to add salt to the water that you’re boiling the spaghetti in?
Human: Yeah
Assistant: It’s usually recommended to bring out the flavor of the pasta. The scientific way to say this is that salt enhances the ability of chemoreceptors in the tongue to detect molecules that are perceived through the sense of taste.
Human: But I mean I’m going to use sauce and stuff, so does it matter?
Assistant: It sounds like you’re skeptical. Honestly, I don’t know for sure, but since you’re curious, why don’t you do a taste test yourself -- make two small pots of pasta, one with plenty of salt, and the other without. Then taste the noodles and see which one you prefer? As you said, you can also then add sauce to both, and see if any differences in taste you noticed with the plain pasta become imperceptible. It should be a fun experiment!
-----
...
-----
Human: {USER QUERY}
Assistant:
The original prompt demonstrates LLMs’ ability to serve as coding assistants by using StarCoder’s 8,000-token context length. The excerpt showcases a condensed version of the prompt, highlighting the assistant’s purpose and the user’s request for code optimization guidance in a web application. The assistant suggests techniques like indexing, caching, and optimizing queries, while emphasizing the importance of data structures and algorithms for better performance. The condensed version maintains the essential conversation elements, enabling the model to provide concise and accurate coding assistance.
The Full prompt : https://huggingface.co/datasets/bigcode/ta-prompt/blob/main/TA_prompt_v1.txt
Read more about it : https://huggingface.co/blog/starchat-alpha
To Read about StarCoderPlus and StarChat Alpha and Beta : https://link.medium.com/0j4lVDyiEAb
Github : https://github.com/yash9439/StarChat-AI-Model
Connect with me : https://www.linkedin.com/in/yash-bhaskar/
More Articles like this : https://medium.com/@yash9439